Scientiﬁc Attention to Sustainability and SDGs: Meta-Analysis of Academic Papers

: Scientiﬁc research plays an important role in the achievement of a sustainable society. However, grasping the trends in sustainability research is difﬁcult because studies are not devised and conducted in a top-down manner with Sustainable Development Goals (SDGs). To understand the bottom-up research activities, we analyzed over 300,000 publications concerned with sustainability by using citation network analysis and natural language processing. The results suggest that sustainability science’s diverse and dynamic changes have been occurring over the last few years; several new topics, such as nanocellulose and global health, have begun to attract widespread scientiﬁc attention. We further examined the relationship between sustainability research subjects and SDGs and found signiﬁcant correspondence between the two. Moreover, we extracted SDG topics that were discussed following a convergent approach in academic studies, such as “inclusive society” and “early childhood development”, by observing the convergence of terms in the citation network. These results are valuable for government ofﬁcials, private companies, and academic researchers, empowering them to understand current academic progress along with research attention devoted to SDGs.


Introduction
Scientific achievement in various fields is essential in ensuring a sustainable society. For example, studies focusing on more efficient energy systems are valuable to prevent additional global warming. According to a United Nations (UN) report [1], preventing global warming is technically achievable but requires "unprecedented and urgent actions", including political agreement. Changes to the environment are foreseen to have a severe impact, especially on people in vulnerable countries [2]. Accordingly, diverse research fields would necessarily be required to engage collaboratively to achieve a sustainable society. Currently, specific scientific contributions such as those that pertain to renewable energy [3] for a sustainable society are receiving worldwide attention [4]; however, other topics have not been recognized in the scientific community or highlighted in the daily news. Apart from this, many research objectives and subjunctives are emerging, not only in materials science and urban engineering but also in fields such as political science and biochemistry. Therefore, it has become important to comprehend the complicated and dynamic sustainability research landscape to evaluate current scientific progress and create a research plan that works toward a sustainable society.
Scientific research that is aimed at achieving sustainability is composed of diverse topics and approaches. For instance, various kinds of studies such as material/bioscience [5,6], industrial engineering [7], operation research [8], and economics [9,10] play important roles in improving have emerged. Our comparison of the SDGs to the academic clusters indicated that a significant correspondence exists between selected areas of study and certain SDGs. This suggests that some SDGs have corresponding research clusters. We also extracted those topics of the SDGs that are discussed in a convergent manner in academic studies, such as "inclusive society" and "early childhood development", by observing the term convergence in the citation network. Some sub-topics of SDGs were discussed in academic studies even when these sub-topics did not form an academic cluster in the citation networks. Our results offer an at-a-glance overview of the dynamically changing research field of sustainability, and the current research focus on SDGs.

Data
A widely accepted consensus concerning the definition of sustainability research does not yet exist. By taking the potential practicality of research toward achieving a sustainable society and the interrelatedness among scientific knowledge into account, many studies are related to sustainability science. In practice, we could run a sustainability search query on a bibliometric database; however, it appears to be challenging to reach a consensus concerning the nature of the query. Therefore, we decided to examine papers that explicitly used the terms "sustainability" or "sustainable". In other words, we investigated the scientific attention paid to sustainability. In total, 514,480 papers were retrieved from the Scopus database on 2 January 2020. Figure 1 shows that the number of publications has dramatically increased each year. We analyzed 312,584 papers in the largest connected components of the citation network to discard unrelated papers that simply used the term "sustainab*".
Data relating to the 17 SDGs were retrieved from the UN description of each SDG [18] to ensure the analysis was accurate. We used a document that described the progression of information from 2016 to 2019 and targeted the indicators of each SDG. In this way, we obtained a document that contained approximately 300-800 words for the 17 SDGs.

Constructing the Landscape and Detecting the Edge Area
The process to construct the landscape of academic publications is as follows: (1) perform citation network clustering; and (2) extract representative terms and fundamental data (e.g., keywords and the average published year) from each cluster.

Citation Network Clustering
The nodes of the network were divided into clusters using the Leiden clustering method [24]. This method searches for the best cluster set, for which the modularity Q value [25] is maximized. The modularity Q indicates the ratio of the density of edges between the same cluster nodes to that when considering a random node assignment while maintaining the sizes of the clusters. The Leiden method provides the necessary clustering accuracy and processing speed in real-time compared with the Louvain method [26], which is widely used in network analyses. After computing the clusters of sustainability papers, the results showed that a few clusters contained a large number of papers. To perform a detailed analysis, we used recursive calculation to identify the sub-clusters of papers in each cluster that contained more than 1000 papers.

Extracting Representative Terms and Fundamental Data
We used natural language processing to extract representative terms for each cluster. N-gram terms, which are contiguous sequences of n words that contain non-term sequences of words such as "is a" may be extracted, and words such as "algorithm" and "algorithms" may be considered different. Therefore, we lemmatized words using the NLTK WordNet Lemmatizer [27] and extracted the terms that met the part-of-speech pattern [28] (<JJ>* <NN.*>+ <IN>)? <JJ>* <NN.*>+. This led us to extract terms such as "high accuracy algorithm" and to exclude terms such as "is different". Then, we singularized the retrieved terms using Python inflection library.
Next, we calculated the representative terms for each cluster. First, we used the concatenated title and the abstract of all the papers in the clusters as the "document" of each cluster. Then, we calculated the extent to which each word was representative of each cluster document using TF-IDF [29], which is the product of the term frequency (TF) and the inverse document frequency (IDF). This simple term-scoring method is empirically useful for a wide range of datasets and has been proven to represent "the amount of information of a term weighted by its occurrence probability" [30]. The equation for TF-IDF is presented below.
where t and d represent the term and the document, respectively; t f (t, d) is the log value of the TF; and id f (t) is the inverse ratio of the number of documents that include the term t(d f t ). We analyzed the contents of the clusters by investigating the high TF-IDF terms in each cluster.

Investigating Scientific Attention to SDGs
We investigated the scientific attention devoted to each SDG from two perspectives. First, we examined whether an academic cluster existed that was closely related to each SDG. Since such a relationship exists, we were able to infer a correlation between the academic cluster and the SDG. Then, we retrieved the convergently-discussed terms related to each SDG. If none of the academic clusters were closely related to the SDG, part of the content of the SDG might possibly be convergently discussed in selected academic papers. The detailed methods for these two investigations are described below.

Linguistic Similarity between an SDG and a Cluster
We assessed the scientific attention of every SDG by comparing academic clusters with SDGs. The comparison between the two was measured by examining the text resemblance. Specifically, we calculated the cosine similarity of the TF-IDF vectors of the documents of each academic cluster and SDG. The TF-IDF vectors were composed of the TF-IDF values of each term in the document, whereas those of the SDGs are calculated in the same way using the SDG document. A high similarity indicates that the cluster and the SDG share many representative terms; therefore, they are considered to share common research purposes, topics, and methods.

Detecting the Scientifically Discussed Terms of an SDG
Even if none of the scientific fields correspond closely to an SDG, certain aspects of the SDG are still likely to have received convergent research attention. We searched for terms that represented the convergently-discussed topics of an SDG. The basic principle is that papers that include a term that is a convergently-discussed topic are likely to be connected via citations. For example, if the term "better accuracy" is assumed to appear in an area of information science, then the TF-IDF values of this term in the area would have a high score. However, the phrase is often simply used in this field of work rather than being convergently discussed. The citation network between papers that contain the phrase "better accuracy" is assumed to be sparse. Conversely, two papers that include the term "disaster risk reduction" are likely to be connected. Thus, we consider the term as a convergently-discussed term. We define the CDD metric via Equation (4).
CDD is the product of the size of the core component of the term log(n(d lc t )), the condenseness of the term n(e lc t ) n(d lc

Research Clusters of Sustainability Science
The 312,584 academic papers were classified into 163 clusters, and we analyzed 23 clusters containing more than 1000 papers. The papers in these clusters are visualized in Figure 2, in which the different colors correspond to different clusters. The authors heuristically posteriorly labeled each cluster (Table 1 lists detailed information for each cluster). These clusters are numbered in the order of the number of their papers. The layout of each paper, which is calculated using LargeVis [31], indicates the structure of the entire network. In short, the nodes (papers) are placed such that connecting neighbors are located nearby. Therefore, the distance between nodes on the map indicates the closeness of the relationship, but the position itself and the axes have no intrinsic meaning.
Sustainability studies are composed of diverse clusters. This result indicates that scientists in various research fields have paid attention to sustainability. At first, we provide a quick look at these fields, except for those closely related to energy (which we analyze in Section 4.2). The largest cluster is Soil and agriculture (1). The related areas located near it in Figure 2 are Food (17), Forest (12), and Ecosystem (3). The cluster of Water (7) is closely related to that of Ecosystem (3). The papers of these clusters mainly discuss the current situation, change, and improvement technology regarding the natural environment. The second-largest cluster is Corporate (2), which discusses a corporate activity, such as supply chain management and sustainable manufacturing. The related areas are the Food (17), CSR (15: Corporate Social Responsibility), and Urban (10: urban and human activities' impact on nature). The papers of the food cluster are located in two areas (the right of Soil/agriculture (1) and the left of Corporate (2)). The former research area is farming, food security, and so on, whereas the latter research area is green marketing, wine industry, and so on. The other large clusters are Building (5), and Smart City (6), which are located near. The research topics of former clusters are building materials, construction, and the use of a building. The latter cluster is about the city's software: transportation, shared economy, compact city, and so on. The clusters of Health care (11), Fishery (13), Tourism (14), Education (16), and Fiscal Sustainability (21) do not have a closely related cluster. For example, maternal care, HIV, and patient intervention are studied in Healthcare (11), and protection of the marine environment and species in marines is the main theme of the Fishery (13). In Tourism (14), scholars are finding the appropriate balance between the growing demand for tourism and the environmental impact of it [32]. The objective of sustainability education [33] (16) is "to integrate the principles, values, and practices that make up sustainable development into all aspects of education and learning" [34]. Education in early childhood, university, and business school is studied in the cluster. Fiscal Sustainability (21) is "the ability of government to maintain public finances at a credible and serviceable position over the long term" [35]. Financial policy such as debt is discussed in the cluster. Therefore, the topic of sustainability is diverse and the researchers of various research fields conduct sustainability science.

Detail of Energy-Related Subcluster
The clusters that are closely related to energy on the whole are Sustainable energy (others) (8) and Bioenergy (9). The Sustainable Energy (others) cluster is mainly composed of papers about the consumption, potential, and allocation of energy, including renewable energy. The central topic of Bioenergy (9) is the production of bioenergy (mainly biofuel). The topics of these clusters are not very similar because these two clusters are not located very close in Figure 2.
In addition to the research of these clusters, we need to examine the papers related to energy whose belonging clusters do not discuss the energy on the whole, which should be discovered. For example, energy consumption in the urban area is discussed in the Urban (10) cluster. Thus, we investigate all sub-clusters and pick up the energy-related sub-cluster. Specifically, the clusters whose top 15 TFIDF words include "energy" are considered the energy-related clusters. In addition, we manually discriminate the many sub-clusters of Catalyst/electrochemical (4), which discusses energy-related technologies, because the word "energy" is not used explicitly in the paper of the cluster. These energy-related clusters are listed in Table 2, along with subclusters of Renewable energy (others) (8) and Bioenergy (9).
In Table 2, the left number of the leftmost column indicates cluster number and subcluster number, and the subclusters are listed in the order of average publication year. The newest subcluster (8-7) of Sustainable energy (others) (8) is composed of the papers which discuss causality and the economic growth and energy consumption [36], and economic aspects of regulation and measures for dissemination. Other topics of the Sustainable energy (others) (8) are Renewable energy source and energy allocation , Microgrid , Desterilize of solar energy , Selection of renewable energy using the analytic hierarchy process (8-3), Building-integrated photovoltaics (BIPV) (8)(9), and Renewable energy in a rural area (8-1). These papers discuss consumption, potential, and allocation of energy, and do not discuss the production of efficient or low environment load energy. For instance, although the title/abstract of 2300 papers on Sustainable energy (others) (8) includes the term "wind", these papers are distributed in each sub-cluster.
Bioenergy (9) research mainly focuses on its production. The cluster is divided into subclusters according to the variant of bioenergy. The recent topics of bioenergy are Biosurfactant (9-9) and Microalgae (9-1). A biosurfactant, which is a surfactant that can be digested by microorganisms, is expected to become a new pollution-free food additive and oil processing/recovery approach [37,38]. Microalgae are expected to produce various organic compounds, such as enzymes and proteins, and accumulate waste in the water. Many microalgae exist in nature, and various applications are expected [39]. The other research topics are Lignocellulosic biomass (9-7), Biofuel from xylose (9-4), Microbial fuel cells (MFCs) , and so on.
Energy is an essential element in other research areas of sustainability science. We showed the energy-related subclusters in the bottom part of Table 2. The studies of Catalyst/electrochemical (4) mainly study the materials/catalysts for energy storage or sources. For instance, focusing on the cluster hydrogen evolution reaction (4-2), a highly efficient hydrogen/oxygen evolution reaction contributes to high-efficient energy production and storage [40,41]. Other energy-related subclusters are Supercapacitor (4-7), which is expected as a new structure of future energy storage, and Hydroxymethylfurfural biomass (4-3), which is an organic material of biofuel derived from plant-based sugars. In Building (5) cluster, energy is an essential factor of Life Cycle Assessment (5-4) and building refurbishment . In Smart city (5), energy-related clusters are comprised of Electric vehicles (6)(7)(8)(9) and Energy transition (6-3), which is a pathway to zero-carbon energy society. Water-energy nexus (7-7), Urban metabolism (10-7), Ecological footprint (10-3), Green-ICT , and Waste Management  are also related to energy.

Emerging Areas of Sustainability Science
Soil agriculture (1), Urban (10), Forest (12), and CSR (15) are mostly composed of papers that were not published recently. On the other hand, Catalyst/electrochemical (4), Building (5), Bioenergy (9), Nanogenerator (22), and Entrepreneurship (23) are relatively new fields. Additionally, Ecosystem (3), Education (16), Food (17), and so on are relatively intermediate areas. Most of the new fields intend to develop chemical/nanotechnology applications for sustainability. However, the finding that the topics of relatively old clusters do not attract recent scientific attention is not a fair conclusion. Two reasons for the freshness of clusters are considered: research areas either emerged recently or researchers working in these areas recently associated their work with sustainability. In addition, the "freshness" of a cluster is affected by the publication speed of the field. This publication speed is higher in nanotechnology and chemistry than in politics and finance. Although it is difficult to distinguish the reasons for freshness from a retrieved dataset, the freshness of the average publication year indicates the level of recent scientific attention.
To retrieve the trend within each cluster, we investigated the sub-clusters of each. Figure 3 illustrates the sizes of the subclusters and their corresponding average year of publication. In this figure, the data of the subclusters is aligned vertically with the corresponding parent clusters. We added the names of the newest/oldest subcluster, which were heuristically named afterward by the authors, to the plotted data for each cluster. The results in the figure show that a relatively old/new sub-cluster exists in each cluster. For example, arbuscular mycorrhizal fungi (soil microorganisms expected to be used for agriculture as microbial fertilizers) [42] is a new topic in the soil/agriculture cluster (1). In the materials/bioscience clusters, Oxygen evolution reaction (4), Geopolymer (5), Renewable energy consumption (8), and Polyhydroxyalkanoates/biosurfactants (9) are emerging research topics. Each subcluster is detailed in the Supplementary Materials (S1).  For detecting the current attractive scientific research field, we listed the top 15 newest subclusters in Table 3. In this table, the number in the leftmost column indicates cluster number and subcluster number, and the subclusters are listed in the order of average publication year. These subclusters are numbered in the order of the number of their papers in each cluster. These subclusters provide an overview of the entire recent trend of scientific attention to sustainability. The five subclusters of Cluster 4 (i.e., Hydrogen/Oxygen evolution reaction (4-2), Porous carbon (4-7), Lithium-ion battery (4)(5)(6)(7)(8)(9)(10)(11), Nanocellulose (4-6), Bisphenol , and Hydroxymethylfurfural yield (4-3)) were extracted as the top new subclusters. We explain these research fields in detail in the context of sustainability. A highly efficient hydrogen/oxygen evolution reaction is an essential technology for the production of solar cells and metal-air batteries [40,41]. Porous carbon is a suitable material for CO 2 sorbents and carbon capture [43]. Lithium-ion batteries, which form an indispensable technology in modern society, should be sustainably produced and recycled [44]. Nanocelluose is an emerging material for various advanced applications, such as water purification [45]. Hydroxymethylfurfural (HMF) is an organic compound derived from plant-based sugars that is a material of biofuels [46].
We also focus on other topics: Maternal/Newborn healthcare (11-1), Bike-sharing (6-7), Smart city (IoT/Big data) , and Edible insects , all of which have been discussed recently. The Supplementary Materials (S1) show that research in Cluster 6 (Smart city) is mainly composed of transportation engineering, economics (6-6, 6-8), and social science (6-4, we 6-2). Thus, Cluster 6 also contains the accumulation of research in diverse scientific fields. In other clusters, such as Ecosystem (3) and Building (5), we observed the attention from diverse research fields. These results indicate that academic research topics, methods, and objectives in many areas have complementary relationships. Although some of these topics may be familiar to experts in each field, we consider the investigation of recent trends in sustainability science to be useful for scientists and decision-makers in companies or in the government.

Scientific Attention to SDGs
In the previous section, we discuss the diverse focus areas of research attention and the dynamic changes this attention underwent. Next, we discuss the research attention to sustainability from the viewpoint of the SDGs. Figure 4 shows the linguistic similarity between each SDG and the academic clusters. We found certain research clusters to correspond to specific SDGs. For example, the Good Health and Well-being (3) SDG is convergently researched in the Healthcare (11) cluster. Similarly, the Life Below Water (14) SDG and the Fishery (13) cluster, the Zero Hunger (2) SDG and the Soil Agriculture (1) cluster, and the Clean Water and Sanitation (6) SDG and the Water Resource (7) cluster are corresponding pairs. These SDGs, therefore, have corresponding academic clusters that discuss the means to achieve them.  However, many SDGs have multiple corresponding clusters. For example, the No Poverty (1) SDG is linguistically similar to the Ecosystem (3), Healthcare (11), and Soil Agriculture (1) clusters. This result is the same in other SDGs (e.g., SDGs 8, 10, 11, 16, and 17). As above, Figure 4 indicates the relationships between scientific attention and SDGs. A few SDGs, such as Reducing Inequality (10), have neither a single nor multiple corresponding clusters. It does not mean that academic researchers do not consider these SDGs. Selected components of the SDG might be researched in the scientific publications in diverse clusters. To verify this, we retrieved the terms that were convergently discussed in the scientific publications and the SDG documentation. Table 4 lists the convergently-discussed terms (CDD) in each SDG document and the representative cluster (the cluster in which the term was most often used) for each SDG. We also plotted the words with the 10 highest TFIDF values of each SDG in the columns on the right. The TF-IDF values indicate the representative contents of the SDGs. The words with high CDD values indicate scientific attention to SDGs that are likely to be practical issues/solutions. For example, retirement age, microfinance, disaster risk reduction, and social protection are among the highest CDD values in SDG 1 No Poverty. The main description of this goal is interpreted by the top TFIDF words: "Extreme poverty, disaster risk, and social protection system". The top words of TFIDF and CDD are interpreted as follows: The top words of TFIDF indicate the main focus of the SDG, and the CDD indicates the scientific attention/progress related to it.
A comparison between Figure 4 and Table 4 indicates the structure of the attention that the SDGs have received from academic research. The first pattern shows that the convergently-discussed terms converge to a single cluster or to specific clusters. SDGs 3 and 14 fall within this type of classification. In this case, we obtain the particulars of the scientific attention that the SDG received from the cluster. However, the representative clusters of the convergently-discussed words for SDG 1 are diverse. For example, the most frequent research field of Microfinance is cluster Financial sustainability (21), and disaster risk results are in cluster Environment (3).
First, we focus on SDGs 16 (Peace, Justice, and Strong Institutions) and 17 (Partnership for the goals) because these goals do not have a corresponding cluster, and the clusters with convergently-discussed terms are diverse. Studies concerning SDG 16 and 17 appear to be challenging for researchers in the hard sciences to imagine. In fact, these SDGs do not have corresponding clusters (in Figure 4). Using the CDD, we found some essential terms that have received research attention. For instance, prisoners, crime, bribery, and judiciary listed in SDG 16 are important terms in academic research. We also retrieved the terms "inclusive society" and "substantial advance". The former phrase is highly important for a sustainable society and is frequently used in official government documents and corporate marketing. "Substantial advance" is an important keyword for evaluating persons, organizations, or governments for their contribution to a sustainable society. Considering the result for SDG 17, many terms such as "policy coordination" and "multi-stakeholder partnership" are convergently discussed. Although these terms might be highly familiar to an expert in these areas, they seem to be informative for people who need to understand the research attention paid to SDGs.
Next, we discuss the scientific attention received by Affordable and Clean Energy (SDG 7). This goal mainly mentions renewable energy in relation to the top TF-IDF terms (renewable energy, electricity sector, and clean fuel). The scientific attention this goal has attracted is diverse: Stove (cluster 12), Renewable Energy Consumption (8), Charcoal (12), Geothermal (8), Marine source (8), Energy Intensity (10), and so on. Therefore, the scientific attention to the SDG mainly originates from scientific fields that focus on energy, in particular, Sustainable Energy (others) (8). However, detail topics shown in the previous section, such as biosurfactant and microalgae, are not detected in this method because these words are not specifically mentioned in the collected document of SDGs.
Finally, we focus on the Catalyst/Electrochemical (4) cluster. The terms relating to the cluster are not prominent in the SDG. The main topic discussed in this cluster is the performance improvement of applications by enhancing or developing new catalysts and materials. Typical applications are oxygen evolution reactions, supercapacitors, fabricated batteries, and so on. Despite the progress in these topics being highlighted in newspapers and scientific journals, these activities are not detailed in the document describing the SDGs, which are a comprehensive assembly of goals from many viewpoints for sustainable societies. Thus, activities that improve energy efficiency are not focused on the SGD, even though their impact is very high. From a practical viewpoint, research in the catalyst/electrochemical (4) cluster contributes to each of the SDGs indirectly; for example, the reduction of CO 2 and highly efficient oxygen/hydrogen generation contributes to the Climate action, Life below water, and Life on land clusters. This analysis indicates the need for new documents, besides SDGs, that evaluate recent progress of studies that straightforwardly contributes to the energy and environment system.

Conclusions
Using network-based classification and text analyses, we investigated the nature and amount of scientific attention devoted to sustainability. In addition, we detected dynamic changes in sustainability science and identified emerging fields (e.g., those involving nanocellulose and oxygen evolution reactions). Scientific clusters (e.g., bio/renewable energy and smart cities) are composed of diverse research fields (e.g., materials science, social science, and economics.). We also observed the relationship between the SDGs and scientific research and succeeded in retrieving the important terms that are convergently discussed in academic papers such as "inclusive society" and "early childhood development". These results should be useful for analyzing the scientific attention received by SDGs, which is essential to enable government officials, companies, and research organizations to make decisions concerning the funding of and investments in scientific research. These implications may be useful for scientists (including those from the hard sciences) who are planning new research topics. We also discussed the limitations associated with analyzing scientific progress using the SDG documents. For example, certain activities involving a straightforward improvement such as energy efficiency are not referred to in SDG documents. Although we did not discuss all the scientific clusters and retrieved terms, further details are published in the Supplementary Materials and may affect the interpretation of the findings we report in this paper.
Our study did not include academic papers that did not contain the terms "sustainability" or "sustainable." In the future, we need to analyze the whole area related to sustainability research, such as wind/solar energy. We also need to develop a model that considers the differences in word usage between the SDGs and academic publications to enable more accurate analysis. Finally, it is important to note that the results of this paper will be out of date in merely a few years, at which time we need to analyze the scientific attention to the SDGs.