Bibliographic dataset of literature for analysing global trends and progress of the machine learning paradigm in space weather research

The field of space weather research has witnessed growing interest in the use of machine learning techniques. This could be attributed to the increasing accessibility of data, which has created a high demand for investigating scientific phenomena using data-driven methods. The dataset, which is based on bibliographic records from the Web of Science (WoS) and Scopus, was compiled over the last several decades and discusses multidisciplinary trends in this topic while revealing significant advances in current knowledge. It provides a comprehensive examination of trends in publication characteristics, with a focus on publications, document sources, authors, affiliations, and frequent word analysis as bibliometric indicators, all of which were analysed using the Biblioshiny application on the web. This dataset serves as the document profile metrics for emphasising the breadth and progress of current and previous studies, providing useful insights into hotspots for projection research subjects and influential entities that can be identified for future research.

Dataset link: Bibliographic dataset and analysis of machine learning in space weather (Original data) a b s t r a c t The field of space weather research has witnessed growing interest in the use of machine learning techniques.This could be attributed to the increasing accessibility of data, which has created a high demand for investigating scientific phenomena using data-driven methods.The dataset, which is based on bibliographic records from the Web of Science (WoS) and Scopus, was compiled over the last several decades and discusses multidisciplinary trends in this topic while revealing significant advances in current knowledge.It provides a comprehensive examination of trends in publication characteristics, with a focus on publications, document Keywords: Bibliometric evaluation Development trends analysis Literature review data Open-source R-package Visualisation sources, authors, affiliations, and frequent word analysis as bibliometric indicators, all of which were analysed using the Biblioshiny application on the web.This dataset serves as the document profile metrics for emphasising the breadth and progress of current and previous studies, providing useful insights into hotspots for projection research subjects and influential entities that can be identified for future research.
© 2023 The Author(s

Value of the Data
• Compilation of research related to the progress and impact of machine learning in the field of space weather studies based on bibliographic records conducted from 2008 to 2023 (within 16 years).• Scholars, researchers, scientists, policymakers, and agency stakeholders would benefit from the analytical information derived from the dataset, scrutinizing bibliometric outcomes, as it would enable them to better understand the specific area of research and its potential expansion.• The reuse of the dataset would provide insights into analyses of scholarly publications, research productivity, changes in research activity over time, and future research directions, and may provide policy guidance that encourages collaborative work in multidisciplinary research.
• Methodological data analysis in analysing bibliographic records, which was accompanied by graphical visualisation in the form of graphs and charts, could be replicated for other bibliometric or scientometric studies.
• The findings of this study can be used to provide a broader perspective of this area, the development and growth of the research domain, its evolution, and potential future research areas.

Objective
Bibliometric indicators are a valuable approach for evaluating and analysing research literature in order to examine and investigate various themes and disciplines by examining production patterns, trends, and the impact of publications [ 1 , 2 ].The primary aim of this article is threefold: to employ a methodological approach based on the PRISMA framework for conducting a literature review, to produce bibliometric datasets that specifically pertain to the topic of machine learning within the field of space weather and finally, to present these datasets in the form of graphical visualisations.The findings from this research paper constitute a starting point as an initial foundation for delving into exploring further research and conducting content analysis regarding the applications of machine learning in different sub-areas of space weather studies.

Data Description
The dataset contained bibliographic data related to machine learning and space weather studies.It is structured into three folders: Dataset, Analysed data_figures, and Analysed data_tables.Bibliometric analyses of publications, journal sources, authors, affiliations, and the most frequently reported words in the abstracts and titles are reported in this study.

Experimental Design, Materials and Methods
As shown in Fig. 7 , a systematic search strategy and framework are outlined to demonstrate the implemented procedures.The process involved in the bibliometric analysis consists of three (3) main elements.The first is to identify the topic and compile a series of literature; the second is to conduct a screening process; and finally, the selection of appropriate literature and bibliometric analysis.According to the PRISMA guideline process, the process of retrieving articles within the context of machine learning incorporates both reputable and wide-coverage databases retrieved from the Web of Science (WoS) and Scopus.The first database used in WoS is comprised of robust, high-quality assurance, reliable journals, and wide discipline coverage, while the second database selected in Scopus comprises the world's largest and most diverse existing literature.The combination of these two databases ensures quality assurance by includ- ing only reputable and robust journals to maintain the quality of the review process outcomes that have been conducted.

Topic, scope and eligibility
The identification process in the systematic review revealed four key elements, comprised of the keyword search process and several search result options.Comprehensive keyword selection, as well as the resulting search articles, were executed and selected using meticulous query strings, which were also confirmed by a dominant expert in this field.Two searches were conducted to fully utilise the TITLE-ABS-KEY and TITLE in both databases.The search keywords were limited to two main keywords-machine learning and space weather-to enable the study to represent the core fields of the domain of the field of interest and obtain reasonable search results.Finally, overlapping documents were removed based on title and abstract screening to avoid redundancy between publications.

Screening
The screening process involved selecting eligible articles and removing redundant documents.Initially, 256 search queries from the Scopus database and 225 results from WoS were returned to comprise return search queries with an additional recommendation of search results (August 21, 2023) with full records of retrieved research articles without filtering the search results.These two databases covered the time span of all years (2008-August 2023), and no filter options were selected to capture all metadata from both databases.To proceed with the bibliometric analysis, owing to the constraint of the matching filter option between the two datasets mentioned.The criteria of literature type, language, timeline, and country were all retrieved and collected from the database without filtering the options available to proceed with bibliometric analysis.The exclusion of some literature was conducted to clean concurrent documents and avoid filtering out irrelevant documents by examining them twice.

Included
The post-screened articles were further analysed within the context of the intended studies by responding to the title and abstract.The author analysed 481 eligible documents; 207 articles were excluded due to post-screening that revealed article duplication and outof-scope articles from the research domain of space weather and machine learning studies, resulting in bibliometric analysis using 274 included documents, which were evaluated with the Biblioshiny [3] web-based package application, which is useful for bibliometric analysis, citation metric analysis, and network mapping.

Limitations
Not applicable.

Ethics Statement
Authors have read and follow the ethical requirements for publication in Data in Brief and confirming that the current work does not involve human subjects, animal experiments, or any data collected from social media platforms.

Data Availability
Bibliographic dataset and analysis of machine learning in space weather (Original data) (Mendeley).