Big data analytics and machine learning: A retrospective overview and bibliometric analysis
Introduction
The ‘millennial’ decade has witnessed several technological breakthroughs, like the emergence of big data concepts and the potential benefits accruing from data science to society (Larson & Chang, 2016) that can be traced back in time to catalogs in the library (Candela et al., 2007). Thus, along with capital and labor, data has emerged as an essential resource for generating prosperity in society.
Though data processing started with traditional methods of attempting to extract, transform, and load processes into data warehouses from enterprise software application (ERP) support systems, according to Kelly (2014), these techniques were not scalable, especially given the enormous increase in data volume. Big data, therefore, evolved, as firms realized that for gaining competitive advantage, investment in data analytics is equally important along with products, processes, and technology (Vohra et al., 2012).
This need for evolution was due to the advent of unstructured data, which could not be directly processed using traditional tools. It needed special data handling and information processing techniques, like Natural Language Processing (NLP) and Machine learning (ML), which entered mainstream research among practitioners. Today, therefore, information processing has become vital for decision-makers, particularly in some key areas like the stock market, where sentiment analysis from the news was crucial in predicting earnings and stock returns (Tetlock, 2007).
Further, the revolution demanded the integration of the cloud, the Internet of Things (IoT), and big data analytics (Press, 2014, Somani et al., 2019). Management of computing and facilitating the development of decision-support systems for socio-economic development boils down to protecting data integrity.
The “big data” revolution coupled globalization with the economic growth of nations alongside the growing needs of businesses and service providers to meet global challenges to derive competitive advantage. The new phenomenon has multiplied the demand for sharpening big-data analytics tools to solve complex business problems across multiple domains, including operations, finance, marketing, health care, strategy governance, security, and managerial decision-making. The demand paved the way for the development of an entirely new discipline, i.e., “Business Analytics,” which is the application of the technical aspects of big-data analytics tools known as ‘Data Science’ to a managerial or business context to take meaningful decisions.
Research in the domain of big data analytics has gained significant traction in recent years (Batistič & der Laken, 2019). Consequently, several studies performed bibliometric protocols to summarize the extant knowledge in the field of big data analytics. For example, Kaffash, Nguyen, and Zhu (2020) conducted a review of multidisciplinary perspectives on big data analytics applications and algorithms in the transportation sector. Similarly, Zhang, Yu, and Zhang (2020) reviewed the application of big data analytics in the context of sustainable supply chains. Furthermore, recent reviews summarized big data analytics applications in the agricultural context (Kamble, Gunasekaran, & Gawankar, 2020) and in disaster management (Akter & Wamba, 2019). Despite these important attempts to synthesize the extant literature, literature on the emergence of the latest technologies like ML and artificial intelligence (AI) seems fragmented (Batistič & der Laken, 2019). The different aspects of ML and their scope for future research have not been captured. There is an evident need for research to provide a comprehensive understanding of the past, present, and future of research on big data analytics in ML. Therefore, the paper considers this gap in bibliometric studies and extends the bibliometric survey of big data analytics to capturing its culmination into ML. The study is approached with the following three research questions to address the research gaps: (1) what is the focus of the current research on big data analytics and ML? (2) what are the key themes and topic clusters in big data and ML, and how have they evolved? Furthermore, (3) from the analysis, what is the scope for future research for both theoretical researchers and practitioners?
This paper provides a bibliographic overview in line with the bibliometric overview of dos Santos et al., 2019, Khanra et al., 2020. Similarly, the paper is also a generalization of journal-wide bibliometric studies like the analysis of the supply chain management techniques in the journal “Computers and Industrial Engineering” performed by Cancino et al. (2019), analysis of Industrial Marketing Management journal conducted by Martínez-López et al. (2020), the 45-year old bibliometric analysis of Journal of Business Research by Donthu, Kumar, and Pattnaik (2020), the bibliometric analysis of the Journal of Higher Education Management conducted by Antia-Obong et al. (2019), and the 20-year old bibliometric analysis of Journal of Global Information Management by Srivastava et al. (2021).
The paper presents an in-depth analysis of the citation and publication trends in big data analytics and ML between 2006 and 2020. This period was chosen based on data access availability in Scopus (Singh, 2019, Baas et al., 2020) and to capture the journal’s latest publishing trends. The significant authors, organizations, countries, and journals are presented. The major themes discussed are highlighted, and the articles are classified into five bibliographic clusters based on the frequently occurring keywords. The approach illustrates the significant themes featuring articles by examining the co-occurrences of author-specified keywords. The frequent topics are also indicated through word cloud analysis, and citation structure analysis is performed to the group to highlight the emerging themes.
Section snippets
Big data analytics and ML
Big data analytics (Sivarajah et al., 2017) is a phenomenon that analyses large volumes of data using sophisticated tools and techniques to extract valuable insights and to solve business use-cases. The need to solve business problems with precision led to the evolution of ML, which involves training existing volumes of data to respond to a specific business problem and validating the same on an unknown business scenario to make accurate decisions.
The application areas of big data analytics are
Results
The results of the descriptive and bibliometric analysis are illustrated and elucidated in this section.
Discussion
The key findings from the bibliometric study and the areas with scope for future research are presented in this section. The first research question was about the current status of research on big data analytics in ML. The key findings answer the research question through bibliometrics.
Conclusion
The present study presents a bibliometric analysis of big data analytics and ML, constituting journal articles from Scopus. This study identified essential themes related to big data analytics in ML and proposed emerging areas for research. At the same time, it represented the stale and saturated areas of research that need inter-disciplinary intervention from more emerging domains. Based on the thematic areas, the future research agenda is presented, and future areas are discussed. Thus, big
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (102)
- et al.
A bibliometric analysis of international impact of business incubators
Journal of Business Research
(2016) - et al.
bibliometrix: An R-tool for comprehensive science mapping analysis
Journal of Informetrics
(2017) - et al.
The internet of things: A survey
Computer Networks
(2010) - et al.
Machine learning models and bankruptcy prediction
Expert Systems with Applications
(2017) - et al.
Twitter mood predicts the stock market
Journal of Computational Science
(2011) - et al.
The Anatomy of a Large-scale Hypertextual Web Search Engine
Computer Networks and ISDN Systems.
(1998) - et al.
A bibliometric analysis of supply chain analytical techniques published in Computers & Industrial Engineering
Computers & Industrial Engineering
(2019) - et al.
Forty years of Computers & Industrial Engineering: A bibliometric analysis
Computers & Industrial Engineering
(2017) - et al.
A bibliometric analysis of creativity in the field of business economics
Journal of Business Research
(2018) - et al.
A bibliometric analysis of Genetic Algorithms throughout the history
Computers & Industrial Engineering
(2017)
Forty-five years of Journal of Business Research: A bibliometric analysis
Journal of Business Research
The evolution of the most important research topics in organic and perovskite solar cell research from 2008 to 2017: A bibliometric literature review using bibliographic coupling analysis
Solar Energy Materials and Solar Cells
Beyond the hype: Big data concepts, methods, and analytics
International Journal of Information Management
Big data and predictive analytics for supply chain and organizational performance
Journal of Business Research
Data quality for data science, predictive analytics, and big data in supply chain management: An introduction to the problem and suggestions for research and applications
International Journal of Production Economics
Achieving sustainable performance in a data-driven agriculture supply chain: A review for research and applications
International Journal of Production Economics
A review and future direction of agile, business intelligence, analytics and data science
International Journal of Information Management
Industrial marketing management: Bibliometric overview since its foundation
Industrial Marketing Management
A bibliometric overview of the Journal of Business Research between 1973 and 2014
Journal of Business Research
In bot we trust: A new methodology of chatbot performance measures
Business Horizons
Bibliometric analysis of water–energy–food nexus: Sustainability assessment of renewable energy
Current Opinion in Environmental Science & Health
Learning in massive open online courses: Evidence from social media mining
Computers in Human Behavior
Co-citation and cluster analyses of extant literature on social networks
International Journal of Information Management
Critical analysis of Big Data challenges and analytical methods
Journal of Business Research
Big data analytics in supply chain management between 2010 and 2016: Insights to industries
Computers & Industrial Engineering
Brands as relationship builders in the virtual world: A bibliometric analysis
Electronic Commerce Research and Applications
Big data analytics in logistics and supply chain management: Certain investigations for research and applications
International Journal of Production Economics
An empirical study on business analytics affordances enhancing the management of cloud computing data security
International Journal of Information Management
Big data and disaster management: A systematic review and agenda for future research
Annals of Operations Research
A Bibliometric Analysis of Journal of Higher Education Management (JHEM) from 2007 to 2016
Library Philosophy and Practice
That ‘internet of things’ thing
RFID Journal
Scopus as a curated, high-quality bibliometric data source for academic research in quantitative science studies
Quantitative Science Studies
History, evolution and future of big data and analytics: A bibliometric analysis of its relationship to performance in organizations
British Journal of Management
Understanding the capabilities of Big Data Analytics for manufacturing process: Insights from literature review and multiple case study
Computers & Industrial Engineering
Pattern recognition and machine learning
The attractiveness of political markets: Implications for firm strategy
Academy of Management Review
The DELOS Digital Library Reference Model
Business intelligence and analytics: From big data to big impact
MIS Quarterly
Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs
IEEE Transactions on Pattern Analysis and Machine Intelligence
Big data: A survey
Mobile Networks and Applications
Enterprise analytics: Optimize performance, process, and decisions through big data
MapReduce: simplified data processing on large clusters
Communications of the ACM
Data Mining and Machine Learning techniques applied to Public Health Problems: A bibliometric analysis from 2009 to 2018
Computers & Industrial Engineering
Predictive analytics. Extending the Value of Your Data Warehousing Investment
TDWI Best Practices Report
Improving the use of analytics and big data by changing the decision-making culture
Management Decision
Citation analysis of grey literature, reflected in dissertations of library and information science
International Journal of Library & Information Science
The virtue of simplicity: On machine learning models in algorithmic trading
Big Data & Society
Business incubators and accelerators: A co-citation analysis-based, systematic literature review
The Journal of Technology Transfer
Cited by (74)
A decision-making tool for the determination of the distribution center location in a humanitarian logistics network
2024, Expert Systems with ApplicationsA conditional random field recommendation method based on tripartite graph
2024, Expert Systems with ApplicationsA systematic review of the soft computing methods shaping the future of the metaverse
2024, Applied Soft ComputingMapping the landscape of FinTech in banking and finance: A bibliometric review
2024, Research in International Business and FinanceBibliometric methods in traffic flow prediction based on artificial intelligence
2023, Expert Systems with Applications
- 1
ORCID: 0000-0002-4074-9505.
- 2
ORCID: 0000-0001-7467-5500.
- 3
ORCID: 0000-0002-7526-206X.
- 4
ORCID: 0000-0001-6605-7745.