Bibliometric dataset (1995–2022) on green jobs: A comprehensive analysis of scientific publications

The realm of green jobs presents a fertile ground for understanding the intersecting pathways between sustainable transition and the labor market. We have crafted a bibliometric dataset centered on this concept, amassing 414 articles from the Scopus and Web of Science databases, following a laid down protocol, PRISMA, spanning the period from 1995 to 2022. This endeavor aims to depict the dynamics, themes, and conceptual approaches shaping the discourse on green jobs. The dataset, structured around 13 descriptive variables such as authors, keywords, and cited references, is made available to researchers, institutions, and decision-makers to provide insight into the academic debates on ecological transition through the lens of employment, especially in the wake of a green economy. The potential for reutilizing these data is expansive. They can serve as a foundation for comparative analyses with the media and institutional portrayals of green jobs. Furthermore, the dataset can be enriched by integrating other forms of literature, such as books, chapters, or conference proceedings, while retaining the existing structure. This expansibility paves the way for a multidisciplinary and multilingual exploration, thereby enhancing the richness and diversity of possible analyses.


Value of the Data
• These data are valuable as they provide an in-depth analysis of publications in scientific journals on green jobs, revealing trends, themes, conceptual pillars, dans outlooks in this field.They shed light on discussions about ecological transition and employment, especially within the context of a green economy.Delving into these data can also highlight gaps in current research and steer the course of future academic studies.• Various stakeholders stand to gain.Researchers, academics, institutions, and decisionmakers can harness it to deepen their understanding of green job-related issues without the need for preliminary sorting to obtain pertinent texts.• Rooted in a comprehensive analysis of international political economy, the data underscore the evolution of the green job concept, offering academic insights.A comparison with media and institutional portrayals of this idea would be relevant.highlighting the evolution of the concept of green jobs.They provide an academic insight into this topic.
Comparing this foundation with the media and institutional representations of this concept would be apt.• Comprising articles from scientific journals, the dataset can be expanded with books, chapters, or conference proceedings, while maintaining the same structure.Contributions in other languages or from journals not indexed on Scopus and Web of Science can also be included.

Background
This bibliometric data set has been developed as part of a study on the conceptual foundations of green jobs and their academic representation.It aims to explore the resonance and compatibility of this concept in various contexts, such as those of the Global North and the Global South.To date, the concept has suffered from a lack of a consensual definition of green jobs, presenting a significant challenge and making the concept difficult to operationalize.However, the definition provided by the ILO and the United Nations in 2008 is often used as a benchmark, identifying green jobs as key positions across numerous sectors that are essential for the preservation and conservation of the environment, while also promoting the creation of decent jobs [1] .In the face of current environmental, economic, and social challenges, green jobs fluctuate in the literature between being agents of short-term economic transformation and elements of a sustainable and inclusive green economy.This ambivalence highlights the need for a comprehensive approach to their integration into sustainable development strategies.This project is part of a broader effort to assess the role of green jobs in sustainable development, their potential to reconcile environmental preservation, economic advancement, and social justice.

Data Description
The database, named "BDGJ" as an acronym for Bibliometric Dataset on Green Jobs [2] , primarily consists of a dataset titled "BDGJ_dataset.csv".This file includes a single sheet, also named "BDGJ_dataset", which compiles information pertaining to 414 articles written in English, exclusively derived from the scientific databases Scopus and Web of Science, spanning from the year 1995 to 2022.The dataset encompasses 13 descriptive variables, titled in accordance with the field tags from the Web of Science Core Collection, represented as two-character tags.Table 1 provides a description of these variables within the data corpus and details their sig- Comparison of contributions and visibility between different databases.

TI Document Title
The full title of the journal article.
Identification of articles, analysis of title trends over time.

PY Year Published
The year of publication of the paper in a journal.
Temporal and evolutionary analysis of research, identification of periods of growth or change within a field.SR Short Reference Title of the article along with its year of publication and journal.Short tag of the document.
Allows for direct referencing of the specific article, without the need to link between various variables.
nificance and the analytical possibilities each variable offers.Additionally, a separate file, named "BDGJ_variables.txt",outlines the definitions of each variable.
The integral elements of a scientific article are interconnected.These interconnections create bipartite networks, which can be depicted as rectangular matrices linking manuscripts to variables, a fundamental structure for network, occurrence, or co-occurrence analysis.Moreover, scientific articles often cite other studies, thus forming citation or coupling networks.Investigating these metrics unveils significant aspects of the respective research system.
The variables mentioned are fundamental for a standard bibliometric analysis, covering a wide range of important information and are chosen specifically to allow for the most comprehensive study possible.Nevertheless, the inclusion of variables such as affiliations, which facilitate the mapping of institutional and geographic collaborations and the examination of research networks would have enriched the analysis.The selection of these variables was predicated on the availability of data from Scopus and Web of Science databases.Variables were omitted if the incidence of relevant information in the articles was too sparse to support a substantive analysis.
Figs. 1-6 depict bibliometric analyses showcasing the most cited publications, journals, and authors, along with the most frequently occurring terms in abstracts and titles.These summaries provide a preliminary overview of the insights gleaned from this data.

Table 2
Three thematic groups related to green jobs.Themes at the confluence of policy and education, emphasizing the role of institutions.
Focus on the individual, particularly the worker, with respect to employment dignity and working conditions in green sectors.
The thorough analysis of the variables contained in this database has led to significant conclusions.For instance, it was possible to identify three major thematic axes guiding the discussion on green jobs: 1) Green jobs in the economic context; 2) The interactions between employment and institutional frameworks; 3) Individual dimensions and social capital.These axes are summarized in Table 2 .Furthermore, our analysis has uncovered five key challenges to promote the development of green jobs across a broad range of countries, with particular focus on developing nations.These challenges, summarized in Table 3 , include the integration of the informal sector into development policies, the valorization of diverse economic sectors, the development of participatory strategies, the establishment of a well-defined economic framework, and the enhancement of investments in human capital.

Experimental Design, Materials and Methods
During the data collection process, the "Preferred Reporting Items for Systematic Reviews and Meta-Analysis" (PRISMA) protocol was adhered to, ensuring a meticulous selection of articles.This methodology is widely endorsed for enhancing the transparency, clarity, and quality of literature reviews [ 3 , 4 ].The PRISMA protocol unfolds over four distinct stages, in line with the guidelines set by Moher et al. (2009) [4] .

Identification
This approach merges the Scopus and Web of Science databases, distinguishing itself from many literature reviews that rely on a single source.The method of integrating multiple databases is scarcely found in existing literature [5] .Several works have highlighted the unique strengths of each database, such as the extensive temporal coverage of Web of Science [6] or the broad range of publications in Scopus [ 6 , 7 ].Although Scopus and Web of Science show a strong correlation [8] , many researchers have emphasized the value of analyzing both concurrently, as their data optimally complement each other [ 5 , 9 , 10 ].Integrating the two datasets can pose challenges, especially due to variations in article information depending on whether the source is Scopus or Web of Science [5] .A method to merge the Scopus and Web of Science databases has been replicated using the R 'Bibliometrix' package and Excel [ 11 , 12 ].With this in mind, initial searches on Scopus and Web of Science were conducted separately before merging the data during the second phase of the PRISMA protocol.
Articles mentioning the terms 'Green jobs' or similar expressions in their title, abstract, or keywords were sought (see Table 2 ).Using site-specific search strings, both singular and plural forms of the expressions were considered ( Table 4 ).

Limitations
The data in this document have several limitations.Firstly, they are based on a set of references that is not comprehensive.The exclusive reliance on English literature, the strict selection of articles, and the constraint of the search to a predefined list of keywords might limit the scope of the dataset.Moreover, while Scopus and Web of Science offer a vast collection of well-referenced articles, their access to non-English journals is limited.Additionally, it would be crucial to delve into the grey literature to understand the developmental strategies of this concept at institutional, political, and media levels.These constraints suggest the need to broaden the research, especially by considering other languages and literature sources.

Ethics Statement
This work complies with the ethical requirements for publication in Data in Brief.This data does not incorporate studies involving animals or humans, nor data gathered from individual social media accounts.The primary data sources used in our study needed no specific permissions and comply with relevant ethical and legal standards.

Fig. 1 .
Fig.1.Most Global Cited Documents.This metric reflects how often a document is cited by others within the comprehensive database, such as Web of Science or Scopus.This information is provided by these databases and incorporated into metadata records.This measure serves as an indicator of a document's overall impact across the entire bibliographic database.

Fig. 2 .
Fig. 2. Ten of the most relevant sources.The source is a journal which published one or more documents included in our collection of 268 sources.

Fig. 3 .
Fig. 3.Ten of the most relevant authors.The selected frequency measure is the number of documents per author.

Fig. 6 .
Fig. 6.Trend topics derived from Keywords Plus are shown as bubbles, with size indicating occurrence frequency.The grey bar shows the median occurrence distribution.The graph plots time on the horizontal axis and topics on the vertical.The median year of occurrences sets the reference year for each topic.For clarity, only a limited number of the most frequent topics are displayed for each year.

Cluster 1 :
Green jobs and EconomyCluster 2: Employment and Institutions Cluster 3: Human and Social Capital Themes centered on green economy, renewable energies, technologies, and policies.

Fig. 7 .
Fig. 7. Flowchart of PRISMA procedures and results at each stage.

Table Subject Economic
Development and Growth Specific subject areaBibliometric metadata sourced from Scopus and Web of Science databases, pertaining to academic literature articles focused exclusively on green jobs.compiled by merging results from similar searches on the Scopus and Web of Science databases.The merging process and removal of duplicates were facilitated using the Bibliometrix software and Excel.No time constraints were applied, and initially, the maximum number of variables was retained.Variable selection was later based on their presumed relevance.Article sorting was conducted following the PRISMA protocol, involving a review of abstracts and, when necessary, the full articles.The entire database is consolidated into a single spreadsheet.Data source locationThe collected data originate from the electronic repositories Scopus and Web of Science, focusing on research articles written in English, with no temporal period restrictions.

Table 1
Description of the 13 variables of BDGJ.
( continued on next page )

Table 3
Five challenges for developing green jobs.

Table 4
Bibliographic databases and Keywords.