Can Multiple Uses of Biomass Limit the Feedstock Availability for Future Biogas Production? An Overview of Biogas Feedstocks and Their Alternative Uses

: Biogas is expected to contribute 10% of the total renewable energy use in Europe in 2030. This expectation largely depends on the use of several biomass byproducts and wastes as feedstocks. However, the current development of a biobased economy requires biomass sources for multiple purposes. If alternative applications also use biogas feedstocks, it becomes doubtful whether they will be available for biogas production. To explore this issue, this paper aims to provide an overview of potential alternative uses of different biogas feedstocks being researched in literature. We conducted a literature review using the machine learning technique “co-occurrence analysis of terms”. This technique reads thousands of abstracts from literature and records when pairs of biogas feedstock-application are co-mentioned. These pairs are assumed to represent the use of a feedstock for an application. We reviewed 109 biogas feedstocks and 217 biomass applications, revealing 1053 connections between them in nearly 55,000 scientific articles. Our results provide two insights. First, a large share of the biomass streams presently considered in the biogas estimates have many alternative uses, which likely limit their contribution to future biogas production. Second, there are streams not being considered in present estimates for biogas production although they have the proper characteristics.


Introduction
Interest in biogas has been growing in recent years, particularly due to its potential to contribute to a renewable energy transition. Through anaerobic digestion, biomass is converted into biogas containing methane. This biogas could be used as a substitute for natural gas in providing electricity, heat and methane for chemical uses [1].
In policy documents on European future energy supply, biogas plays an important role. The Directive 2018/2001 promotes biogas as one of the strategic means for the European Union (EU) (EU's data in this paper includes United Kingdom) to reach the target of 32% renewable energy use in 2030 [2]. In 2030, the expected availability of biogas is 1.7 EJ, which is equivalent to 3.7% of the energy consumption of the EU. To create such a biogas contribution, one-third is derived from energy crops, while the rest is mainly produced from manure and different types of organic waste [3,4]. In other research, straws, sewage sludge, organic house waste, food waste, waste from landscape  [22] and biomass application categorization used in this research. The pyramid indicates which applications can add the highest value to the biomass. The higher in the pyramid, the higher the value of the applications as well as the more knowledge and skills required for using the biomass.
As mentioned earlier, a shift towards a more biobased society will lead to other uses of biomass than the present which likely affects the availability for biogas production. For realistic estimates for the future availability of biomass for biogas production these new future uses of biomass should be taken into account. Novel uses take time to be developed, so we assume that alternative uses of feedstocks in the future can be found in the scientific literature. To better understand the feedstocks competition for biogas production, an overview should be available identifying all biogas feedstocks and connecting these to all potential biomass applications taking into account the BVP. This would enable the identification of the feedstocks which are more likely to be available for biogas.
Currently, such an overview is lacking. This is a complicated task as it involves reviewing thousands of papers divided into many different research niches. Effectively processing this amount of information would be unfeasible without automated assistance.
In this paper, through a machine learning literature review method, we aim to provide an overview of potential alternative uses of different biogas feedstocks being researched in nearly 55,000 scientific papers. In this way, we seek to develop better knowledge about the role of potential biomass competition in the biobased economy, especially for the contribution of biogas production in the future energy transition.

Machine Learning Approach: The Co-Occurrence Analysis of Terms
Machine learning is referred to as "the automated detection of meaningful patterns in data" [23]. This approach combines multiple techniques to extract information from large data sets [23]. In this research, we use the technique "co-occurrence analysis of terms" described by Davis et al. [22]. This technique can be used to read abstracts of thousands of documents, scan these for terms specified by the researcher, and then record co-occurrences of these terms [22]. This work only examines cooccurring terms if they appear in separate pre-defined lists. By making one list of terms describing feedstocks and another with terms describing biomass applications, this technique will identify pairs of feedstocks and applications mentioned together in literature.
In [22], Davis and his colleagues also used this technique on a list of biomass terms and a list of application terms to identify value pathways for organic wastes. The results provide almost 2500 connections between 450 organic wastes and 200 applications which is much broader than covered by previous review studies. Although the authors also pointed out that the method did not "extract the nature of relationship between the terms" [22], it does show the potential of this technique to uncover combinations of feedstocks and biomass that research may not immediately think of.
To provide an overview of the alternative uses for different biogas feedstocks found in literature, we adapted this technique for our research. We utilized the co-occurrence analysis algorithm provided by Davis, although we improved the list of feedstocks to make it more focused on biogas feedstocks. In addition, we also improved the list of applications with a better structured categorization as Davis [22] suggested.
Following the guidelines of Davis [22], our research starts with three steps: (1) literature collection; (2) identifying biogas feedstocks and biomass applications; (3) co-occurrence calculation. Steps 1 and 2 are manual collections which will be described in details in Section 2.2.
Step 3 is the use of the cooccurrence calculation algorithm of Davis [22] in R [24]. Source codes of the first three steps are online and accessible via the link mentioned in Supplementary Materials Appendix D. The main results of these steps are as follows: i.
A co-occurrence matrix: We use one large grid matrix to demonstrate which biogas feedstock is mentioned along with certain applications. Here we define a co-occurrence as one pair of a specific feedstock and application which are co-mentioned at least once in the literature (e.g: maize stover and bioethanol, pig manure and compost). If there is at least one co-occurrence in the included literature, the connection between the biomass and the application is shown in the matrix. The order of the rows and columns (feedstocks and applications) in the matrix is based on the similarity of the terms in the literature collection. It presents similar types of feedstocks suitable for an application near each other and similar types of applications requiring a feedstock are also near each other. This ordering is done in two steps: first there is an automated ordering based on hierarchical clustering of the row and column values in the original matrix output by the algorithm [22]; second, we manually check and adjust the ordering. ii.
A list of literature: This indicates the exact literature where each co-occurrence happened.
Furthermore, we are aware that the result of the "co-occurrence analysis of terms" algorithm by Davis [22] is simply the co-mentioning of each word pair and does not explicitly indicate the relation between these terms. Therefore, we performed the fourth step (4) co-occurrence validation which strengthens the interpretation from the co-occurrences to the potential alternative uses of the biogas feedstocks. This step is described in more detail in 2.3. See Figure 2 for the overview of the research methodology.

Collecting Literature on Biomass Applications
The aim of this step is to collect a large number of articles which discuss the applications of biomass. We ran the search query mentioned below on ScienceDirect. ScienceDirect is a large scientific publication database whose topics vary from Physical Sciences and Engineering, Life Sciences, Health Sciences to Social Sciences and Humanities [25]. This large and broad-scope collection, and the fact that the database allows for automatically downloading thousands of abstracts [26] makes ScienceDirect suitable for collecting literature for this research.


Search query: queryString = "title-abs-key (technology OR process OR conversion OR treatment OR use OR production OR application) AND title-abs-key (product OR waste OR by-product OR byproduct OR feedstock OR additive OR catalyst) AND title-abs-key (organic OR bio OR biomass)".

Identifying Biogas Feedstocks
The list of biogas feedstocks used in this research is mainly based on the guidance of the Food and Agriculture Organization (FAO) Biogas Industrial User Manual [27,28] and the European Feedstock Atlas [29,30]. The lists of biogas feedstocks suggested by these two reports are quite close to each other. They also cover a large variety of byproducts and waste biomass which are potentially usable for European biogas production and have been discussed in literature. In these reports, the feedstocks are grouped into different categories reflecting the type of process producing them. For example, wheat straw belongs to the category of Crop leftovers and coconut extraction meal belongs to the category of Vegetable oil production.
Each biogas feedstock recommended by FAO and European Feedstock Atlas is a unique stream, which differs from another by its own biogas yield, physical and chemical characteristics such as dry matter and volatile solids content. This is understandable because these are technical documents.
However, in our research, it is important to define the terms which are likely to appear in the literature. We identified four confusing situations and decided to group certain feedstocks to address this:


A generic stream and a specific stream: There are some streams which are general but seem to overlap with other streams such as dairy industry waste which may conceptually include cheese waste. However, in the context of their applications they are different streams. So, if the two main lists mentioned both, we will keep both the generic and the specific terms.  Streams with names which may be interchangeable in literature but have very different physical properties according to the two main lists, such as beer barm and brewer's grains. We decided to keep them as two separate terms in our list.  Streams with names which may interchangeable in literature and have similar physical properties according to the two main lists (e.g: maize straw and maize stover) or just different in the water content (e.g., cattle slurry and cattle manure). We decided to combine them into one term.  Streams which are rarely found in the scientific literature on biomass applications such as "barley feeding meal". We excluded these from our lists.
To verify whether the terms are similar, or to understand more about the feedstocks, we have to compare them in Google, different literature and several websites about feedstocks such as Feedipedia [31] and Feedbase [32]. Additionally, we added biomass sources mentioned in [33] and [34] which were not included in FAO Biogas Industrial User Manual and the European Feedstock Atlas.
Due to the criticisms mentioned in the introduction, energy crops are excluded in this research.

Identifying Biomass Applications
The Biomass Value Pyramid (BVP) categorizes the final uses of biomass into groups based on their relative economic value. The detailed categorizations vary from literature but these six groups are usually mentioned, from high to low value: Pharmaceuticals, Human Food, Animal Feed, Chemicals, Materials and Energy [20,21,35]. This gives a useful direction on which literature we should look at to inventory the specific applications. With the inspiration of the BVP, we categorized the specific applications into five groups: Animal Feed, Chemicals, Chemical and Energy, Materials and Energy. "Food" is omitted as we only considered byproducts and waste streams. "Pharmaceuticals and high-end chemicals" is merged with "Chemicals", because biomass requires chemical processing before being used in these higher value applications. This is part of a broader issue as for many chemicals as it is almost impossible to determine the end application of a chemical substance. Additionally, we distinguished the group Chemical and Energy because some chemicals such as biomethane and biohydrogen can be used for both energy or further chemical industrial processes (see Figure 1). This means that the group "Energy" only includes applications which are not also used as chemicals.
To define the terms for each application, we have different strategies for each category:  Animal feed: The feedstocks that can be fed to animals raw or after processing. Although the processing consists of multiple steps with intermediate components, we only cared whether those components led to animal feed or not. As a result, we grouped terms found in literature such as forage, fodder, and animal feed supplements into a single term: animal feed.  Chemicals: We used the original terms used in the approach from Davis et al. [22] supplemented by terms from other literature. Energy: Unlike chemical and materials which have multiple conversion processes, bioenergy has quite few and straightforward conversion processes. In literature, energy applications can be described by either end use products or the conversion processes, for example, torrefied biomass and torrefaction. We did an inventory of different bioenergy processes to make sure that we did not miss any bioenergy applications. Then the terms for energy applications were defined by their end products.
 Chemicals and Energy: Like energy, the terms included in this group have few conversion processes and can be described by either end use products or conversion processes. We defined the terms for chemical and energy applications using the end products.
Multiple references used for identifying the list of biomass applications are mentioned in Supplementary Materials Appendix B.

Variants of the Terms
As mentioned above, a specific feedstock or application can be described by several synonyms. In processing the literature to identify the co-occurrences, we consider all of these synonyms and also other term variants such as plural and singular forms, and differing adjectives. in the presentation of the result, we use a single term for each biogas feedstock and each biomass application. As a part of the technique from Davis et al. [22], there is an algorithm to help generate the variants of our considered terms in the literature collection and to group the result into one single term. It should be noted that while for other categories the term variants are only defined by the end products, the variants for Energy and Energy and Chemical applications are decided by both the end products and conversion processes.

Co-Occurrence Validation
Our overview of potential alternative uses of biogas feedstocks depends on the assumption that a co-occurrence represents the connection where a feedstock is used as input for an application. However, a feedstock and an application can be mentioned together for other reasons, especially when the application is referred to as an end product. For example, the co-mentioned end product is a substance required to process the feedstock. Therefore, in this research, we manually checked a sample of the co-occurrences to see to which extent the co-occurrences of our dataset reflect the assumed connection. We performed this manual check for four types of co-occurrence: i.
The most frequent 20 co-occurrences which happened in the literature collection; ii.
Random 20 co-occurrences which happened once in the literature collection: from the list of cooccurrences that happened once which was ordered alphabetically based on the feedstock terms, we first randomly chose the first co-occurrence to check. Then we checked the co-occurrence 20 below the previous one. This procedure was followed to collect 20 co-occurrences. However, if one of the collected co-occurrences was too similar to an earlier collected co-occurrence, we replaced it with the co-occurrence below it from the list; iii.
Random 20 co-occurrences which are unexpected to the knowledge of the authors. For example, we checked co-occurrences where feedstocks are considered as waste and the applications belong to higher level of the BVP. The list of co-occurrences was ordered alphabetically based on the feedstock terms; iv.
All co-occurrences of the feedstocks which have the lowest number of co-occurring applications.
The first two validations were done to check whether a co-occurrence found frequently in the literature more reliably represents the use of a feedstock for an application than those found in only one. The third validation was to see whether the unexpected co-occurrences includes the one that represent our assumption. The fourth validation was performed because the interpretation of feedstocks with very few co-occurring applications is likely more sensitive to mistaken cooccurrences than feedstocks with many co-occurring applications.
To validate each co-occurrence, we used the list of literature (Supplementary Materials Appendix E), a product of the co-occurrence calculation algorithm, to locate the exact literature(s) where the concerned co-occurrences happened. Then we read the abstract(s) to understand the connection between the feedstock and the application. The validation stops when we find one abstract from literature confirming the assumed connection. If none of the located literature confirms the assumed connection, we consider it a false positive.

Results of Literature Collection and Identifying Biogas Feedstocks and Biomass Applications
We collected the abstracts of 54,322 distinct articles about the applications of biomass from ScienceDirect. The articles are scientific research from the years 1970 to 2018. We also identified 109 biogas feedstocks which are byproducts and wastes streams, and 217 biomass applications. These are the inputs for generating the co-occurrence matrix described in 3.3. Table 1 shows the energy application terms and their conversion process which are used to generate more variants of terms to capture this application in the literature. Detailed lists of application and feedstocks, their term variants and references are presented in Supplementary Materials Appendices A and B.

Appeared Terms in the Co-Occurrences, Co-Occurrence Validation and Unexpected Connections
Two-third of the feedstocks and applications which were identified in step two showed up in the co-occurrences: 71/109 biogas feedstocks and 150/217 applications. This means that the 150 applications and the 71 feedstocks have in some way been mentioned together in the literature collection. The remaining feedstocks and applications might be mentioned individually in the literature collection but not with another term on the lists to form a pair of feedstock-application cooccurrences. Some feedstocks which did not appear in the co-occurrences have similar physical properties with the ones which appeared. For example, wheat straw appeared in the co-occurrences but oat straw did not. All categories of feedstocks and applications have their representatives in the co-occurrences.
The pairing between the feedstocks and applications which appeared resulted in 1053 cooccurrences. Half of the co-occurrences only happened once in literature. Of these, 102 co-occurrences were manually checked according to the four criteria mentioned. Among the checked subset, in total, 65% of the co-occurrences represented the assumed connection that the feedstock is an input for the co-occurring applications. This percentage varied between the four validated groups: 95% for the most frequent co-occurrences; 40% for the co-occurrences happened once; 58% for the unexpected cooccurrences; and 66% for the co-occurrences of the feedstock group with lowest number of cooccurring applications. On one hand, the first two show that co-occurrences which happened once are more likely to be mistaken than those that happened more frequent. On the other hand, 40% is also a substantial number that we would miss if we disregard those were found once. The last two show that despite large amounts of false positives, these two groups include sizable numbers of cooccurrences which indeed represent the assumed connections.
From the co-occurrences that did not match our assumed connection, some appeared frequent in the literature collection and some belonged to the unexpected group. For the high frequent cooccurrences, the false positive one that we observed often appear in the literature with a quite consistent connection. Glycerol and acetic acid are often mentioned together because they are often mixed together in common chemical processes [36,37]. In the group of unexpected co-occurrences which mismatch with our assumed connection also, most are mixed wastes so what we see is that the application terms actually indicate the other components of the waste stream. For example, organic house waste co-occurred with different types of bioplastics [38][39][40].
However, the co-occurrences also reveal unexpected connections which truly represent our assumed connection. For example, "rice straw", a cellulose-rich material, can be used in fatty acids extraction; one article describes an engineering experiment that successfully extracted fatty acids from a type of micro bacteria grown rice straw substrate for biofuel production in a similar manner to algae oil/microbial oil [41]. Another example is that different types of manure can be turned into biodiesel, pyrolysis oil, acetone, and methyl esters [42][43][44][45] For more details on co-occurrence validation, see Supplementary Materials Appendix C.

Co-Occurrence Matrix
In the co-occurrence matrix (Figure 3 and Supplementary Materials Appendix D), each row represents one biogas feedstock and each column represents one biomass application. The black cell at the meeting point of one row and one column indicates that the feedstock and the application cooccurred at least once in the literature collection. White cells mean that the feedstock and the application does not co-occur in the literature collection. We can see several clusters in the matrix. The feedstock groups at the two ends of the horizontal axis co-occurred with many applications across all application categories. The first group consists of homogenous streams with high contents of a specific substance such as a lipid (glycerol, animal fats, fish meal), protein (distilled grains), fiber (maize stover, wheat straw), sugar (sugar beet molasses), or nitrogen (different types of manure). The second group consists of mixed streams like organic waste and food leftover. This could be expected since feedstocks with a high concentration of a desired substance makes it easier to extract, and mixed streams might contain multiple desirable substances to extract.
With regards to the applications, animal feed and the group of fertilizer, soil improver and compost co-occurred with one-fourth of biogas feedstock streams. Different types of bio-energy except bio-ethers and bio jet fuel also co-occurred with many biogas feedstocks. Besides, certain types of materials and chemicals have other noticeable applications which appeared in a lot of the cooccurrences.

Top Biogas Feedstocks Having Highest and Lowest Number of Co-Occurring Applications
From results of the co-occurrence calculation, the number of applications that a feedstock cooccurred with ranges from 0-110. Almost half of the reviewed feedstocks have at least four cooccurring applications. We considered feedstocks co-occurring with at least 20 applications as having a high number of co-occurring applications; feedstocks co-occurring with three or fewer applications are considered to have a low number of co-occurring applications.

Discussion
The goal of this research was to provide an overview of alternative uses of biomass streams suitable for biogas production. This overview should provide insights in whether biomass streams are likely to be available for future biogas production or if competition with other uses will emerge. To do so we used a machine learning technique which made it possible to review nearly 55,000 papers. The quality of this review deviates from a normal review where the scientists reads the paper.

Limitations and Recommendations
There are two main limitations of the methodology in this research. On one hand, the overview includes co-occurrences which do not represent the assumed connection. Our validation identified false positives for a subset of co-occurrences. Having false positives will result in the alternative uses of feedstocks identified with the co-occurrence analysis to be higher than the actual number of uses. Improving the precision of the term variants will likely decrease this percentage of false positive, but it is unlikely to come close to 0 because co-occurrences with other strong connections will still show up. To have an overview without any false positive would likely require additional support from techniques which could predetermined the unwanted connections and filter them out from the result of co-occurrence calculation.
On the other hand, there are alternative uses of feedstocks which have been researched in literature that were not identified in our result. For example, oat straw had no co-occurrences despite the fact that it can be composted [46]. Such false negatives mean that feedstocks would have more actual uses than identified by our method. The false negative can be caused by two reasons. One is that the literature discussing these connections are not included in the database. Second is that the term variants are not precise enough to capture this connection. So, the overview presented is not covering all relations. This means that if one is interested in the use of one specific biomass stream extra literature search has to be done with other search queries.
The third limitation is about the scope of this study. In this research we only consider the current biogas feedstocks and study their potential use in the future. However, another aspect of the bioeconomy is that due to the new way of using materials, there will be new biomass streams that can also be used for biogas production. There is research on the potential use for biogas of those waste streams [47,48]. However, there are no overviews of these new potential streams. To have more insight in the availability of biomass for biogas in the future, we recommend to have more research on these future streams and their alternative uses.

Robustness of the Research
The abovementioned limitations have an impact on the exact number of alternative uses of the biogas feedstocks in our overview. However, due to the sheer size of the analyzed literature, the classification of the number of alternative uses between high, medium, and low should be robust. Even if only 65% of the co-occurrences are the assumed connection, feedstocks with high numbers of alternative uses will still have a much higher number of co-occurrences than the rest. Likewise, feedstocks with low numbers of co-occurrences will not likely be missing enough alternative uses to turn it into a higher classification. In other words, false positives and false negatives will have little impact the classification, so the big picture of the overview is robust.

Implications for the Expectation of Future Biogas Production in Europe
Our results show that many biogas feedstocks might have multiple alternative uses, and many of these uses have higher value than biogas according to the BVP. This group of feedstocks overlaps with many main feedstocks underlying European biogas expectations. For instance, 8 of the 16 biogas feedstock groups suggested by the EU New Renewable Energy Directive [2] belong to a group with likely a high or medium number of alternative uses. Only for roadside grass and natural grass, we found limited alternatives. If we assume that biomass streams with high and middle alternative uses are not available for biogas production this will have enormous impacts of the biogas potentials. The potential would drop 55%-80% in the two studies [3,5] showing the explicit contributions of each feedstock to the total biogas estimate. In combination with the fact that in present estimates for biogas potentials energy crops are included that are under heavy debate, our research indicates that the existing EU estimate of 1.7 EJ from biogas in 2030 is rather optimistic. In addition, our research suggests other feedstocks with likely low number of alternative uses which might be interesting for biogas production in Table 4.  [2,3,7] Varied industrial by-products [2,6,7] High Varied industrial by-products [2,6,7] Sewage sludge [2,3,7] Medium Natural grass [5,7] Roadside grass [3,7] Low

Potential Use of the Overview in Research on Biogas Feedstock Competition in a Biobased Economy
Our research can help biogas research and expectations to be framed in the context of a broader transition to the biobased economy. Current research on the topic of biomass competition typically reviews 20-50 applications or aggregated groups of feedstocks [49][50][51][52]. By reviewing 109 biogas feedstocks and 217 biomass applications, we identified 1053 individual connections between them in literature which far outnumbers the typical review. In addition, we were able to differentiate groups based on the likely number of alternative applications and whether these uses are higher up the BVP. Thus, our overview can guide research and expectations of future biogas production in taking into account alternative uses of biomass within the biobased economy.
Given the fact that research within specific niches might require a higher level of detail for a few feedstock-application connections, our research can be seen as a filtering tool to identify relevant literature. The machine learning approach was able to collect more than 50,000 articles which contain information of biomass applications and to narrow them down to about 3000 articles that include likely relevant co-occurrences. By reducing the number of articles by an order of magnitude, it turns an unfeasible amount of manual reviewing into a potentially manageable task. When looking at specific feedstocks or applications, the set of articles is reduced even further. Without this machine learning technique, providing a similar overview would be extremely time-consuming or be significantly less comprehensive.

Conclusions
In this paper, we provide an overview of biomass streams that can be used for biogas production and their alternative uses. By using the machine learning technique "co-occurrence analysis of terms", the study was able to process a substantial amount of academic literature and identify more than a thousand connections between biogas feedstocks and potential biomass applications.
The overview provides two insights. First, a large share of the biomass streams presently considered in the biogas potential evaluations have many alternative uses in a future biobased economy. In particular, composting-fertilizer-soil amendment and applications related to bioenergy are likely to compete with biogas for biomass feedstocks. This indicates that their contribution to future biogas production is likely to be lower. Second, there are streams not being considered in present policy documents for biogas production although they have the proper characteristics. This shows the advantage of using a value free machine learning process that is able to think out of the box.