Big data analytics and machine learning: A retrospective overview and bibliometric analysis

https://doi.org/10.1016/j.eswa.2021.115561Get rights and content

Highlights

Abstract

The research and practice of information systems have progressed by leaps and bounds due to the emergence of big data analytics and machine learning. The paper undertakes a bibliometric study to analyze the contributions of major authors, universities/organizations, and countries in terms of productivity, citations, and bibliographic coupling. A sample of 2160 articles from the Scopus for the period 2006–2020 is the basis of the study. The publications are grouped into five clusters, of which Cluster 1 is consistently dominant in the information systems publication landscape. Cluster 2 includes published studies on the Internet of Things, security, and cloud computing, which have also been widely researched. Cluster 3, the third-largest cluster, has attempted to investigate social media analytics. Cluster 4 aims to look into the impact of classification and predictive, which is found to have sustained research interest. Topics with scant coverage in terms of papers are primarily in Cluster 5, indicating saturation in the area and the need for conducting inter-disciplinary studies. The results of our study provide valuable insights for potential contributors and global audiences in terms of emerging topics for research.

Introduction

The ‘millennial’ decade has witnessed several technological breakthroughs, like the emergence of big data concepts and the potential benefits accruing from data science to society (Larson & Chang, 2016) that can be traced back in time to catalogs in the library (Candela et al., 2007). Thus, along with capital and labor, data has emerged as an essential resource for generating prosperity in society.

Though data processing started with traditional methods of attempting to extract, transform, and load processes into data warehouses from enterprise software application (ERP) support systems, according to Kelly (2014), these techniques were not scalable, especially given the enormous increase in data volume. Big data, therefore, evolved, as firms realized that for gaining competitive advantage, investment in data analytics is equally important along with products, processes, and technology (Vohra et al., 2012).

This need for evolution was due to the advent of unstructured data, which could not be directly processed using traditional tools. It needed special data handling and information processing techniques, like Natural Language Processing (NLP) and Machine learning (ML), which entered mainstream research among practitioners. Today, therefore, information processing has become vital for decision-makers, particularly in some key areas like the stock market, where sentiment analysis from the news was crucial in predicting earnings and stock returns (Tetlock, 2007).

Further, the revolution demanded the integration of the cloud, the Internet of Things (IoT), and big data analytics (Press, 2014, Somani et al., 2019). Management of computing and facilitating the development of decision-support systems for socio-economic development boils down to protecting data integrity.

The “big data” revolution coupled globalization with the economic growth of nations alongside the growing needs of businesses and service providers to meet global challenges to derive competitive advantage. The new phenomenon has multiplied the demand for sharpening big-data analytics tools to solve complex business problems across multiple domains, including operations, finance, marketing, health care, strategy governance, security, and managerial decision-making. The demand paved the way for the development of an entirely new discipline, i.e., “Business Analytics,” which is the application of the technical aspects of big-data analytics tools known as ‘Data Science’ to a managerial or business context to take meaningful decisions.

Research in the domain of big data analytics has gained significant traction in recent years (Batistič & der Laken, 2019). Consequently, several studies performed bibliometric protocols to summarize the extant knowledge in the field of big data analytics. For example, Kaffash, Nguyen, and Zhu (2020) conducted a review of multidisciplinary perspectives on big data analytics applications and algorithms in the transportation sector. Similarly, Zhang, Yu, and Zhang (2020) reviewed the application of big data analytics in the context of sustainable supply chains. Furthermore, recent reviews summarized big data analytics applications in the agricultural context (Kamble, Gunasekaran, & Gawankar, 2020) and in disaster management (Akter & Wamba, 2019). Despite these important attempts to synthesize the extant literature, literature on the emergence of the latest technologies like ML and artificial intelligence (AI) seems fragmented (Batistič & der Laken, 2019). The different aspects of ML and their scope for future research have not been captured. There is an evident need for research to provide a comprehensive understanding of the past, present, and future of research on big data analytics in ML. Therefore, the paper considers this gap in bibliometric studies and extends the bibliometric survey of big data analytics to capturing its culmination into ML. The study is approached with the following three research questions to address the research gaps: (1) what is the focus of the current research on big data analytics and ML? (2) what are the key themes and topic clusters in big data and ML, and how have they evolved? Furthermore, (3) from the analysis, what is the scope for future research for both theoretical researchers and practitioners?

This paper provides a bibliographic overview in line with the bibliometric overview of dos Santos et al., 2019, Khanra et al., 2020. Similarly, the paper is also a generalization of journal-wide bibliometric studies like the analysis of the supply chain management techniques in the journal “Computers and Industrial Engineering” performed by Cancino et al. (2019), analysis of Industrial Marketing Management journal conducted by Martínez-López et al. (2020), the 45-year old bibliometric analysis of Journal of Business Research by Donthu, Kumar, and Pattnaik (2020), the bibliometric analysis of the Journal of Higher Education Management conducted by Antia-Obong et al. (2019), and the 20-year old bibliometric analysis of Journal of Global Information Management by Srivastava et al. (2021).

The paper presents an in-depth analysis of the citation and publication trends in big data analytics and ML between 2006 and 2020. This period was chosen based on data access availability in Scopus (Singh, 2019, Baas et al., 2020) and to capture the journal’s latest publishing trends. The significant authors, organizations, countries, and journals are presented. The major themes discussed are highlighted, and the articles are classified into five bibliographic clusters based on the frequently occurring keywords. The approach illustrates the significant themes featuring articles by examining the co-occurrences of author-specified keywords. The frequent topics are also indicated through word cloud analysis, and citation structure analysis is performed to the group to highlight the emerging themes.

Section snippets

Big data analytics and ML

Big data analytics (Sivarajah et al., 2017) is a phenomenon that analyses large volumes of data using sophisticated tools and techniques to extract valuable insights and to solve business use-cases. The need to solve business problems with precision led to the evolution of ML, which involves training existing volumes of data to respond to a specific business problem and validating the same on an unknown business scenario to make accurate decisions.

The application areas of big data analytics are

Results

The results of the descriptive and bibliometric analysis are illustrated and elucidated in this section.

Discussion

The key findings from the bibliometric study and the areas with scope for future research are presented in this section. The first research question was about the current status of research on big data analytics in ML. The key findings answer the research question through bibliometrics.

Conclusion

The present study presents a bibliometric analysis of big data analytics and ML, constituting journal articles from Scopus. This study identified essential themes related to big data analytics in ML and proposed emerging areas for research. At the same time, it represented the stale and saturated areas of research that need inter-disciplinary intervention from more emerging domains. Based on the thematic areas, the future research agenda is presented, and future areas are discussed. Thus, big

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (102)

  • N. Donthu et al.

    Forty-five years of Journal of Business Research: A bibliometric analysis

    Journal of Business Research

    (2020)
  • P. Fonteyn et al.

    The evolution of the most important research topics in organic and perovskite solar cell research from 2008 to 2017: A bibliometric literature review using bibliographic coupling analysis

    Solar Energy Materials and Solar Cells

    (2020)
  • A. Gandomi et al.

    Beyond the hype: Big data concepts, methods, and analytics

    International Journal of Information Management

    (2015)
  • A. Gunasekaran et al.

    Big data and predictive analytics for supply chain and organizational performance

    Journal of Business Research

    (2017)
  • B.T. Hazen et al.

    Data quality for data science, predictive analytics, and big data in supply chain management: An introduction to the problem and suggestions for research and applications

    International Journal of Production Economics

    (2014)
  • S.S. Kamble et al.

    Achieving sustainable performance in a data-driven agriculture supply chain: A review for research and applications

    International Journal of Production Economics

    (2020)
  • D. Larson et al.

    A review and future direction of agile, business intelligence, analytics and data science

    International Journal of Information Management

    (2016)
  • F.J. Martínez-López et al.

    Industrial marketing management: Bibliometric overview since its foundation

    Industrial Marketing Management

    (2020)
  • J.M. Merigó et al.

    A bibliometric overview of the Journal of Business Research between 1973 and 2014

    Journal of Business Research

    (2015)
  • A. Przegalinska et al.

    In bot we trust: A new methodology of chatbot performance measures

    Business Horizons

    (2019)
  • S.A. Sarkodie et al.

    Bibliometric analysis of water–energy–food nexus: Sustainability assessment of renewable energy

    Current Opinion in Environmental Science & Health

    (2020)
  • C.-W. Shen et al.

    Learning in massive open online courses: Evidence from social media mining

    Computers in Human Behavior

    (2015)
  • W.-L. Shiau et al.

    Co-citation and cluster analyses of extant literature on social networks

    International Journal of Information Management

    (2017)
  • U. Sivarajah et al.

    Critical analysis of Big Data challenges and analytical methods

    Journal of Business Research

    (2017)
  • S. Tiwari et al.

    Big data analytics in supply chain management between 2010 and 2016: Insights to industries

    Computers & Industrial Engineering

    (2018)
  • C. Veloutsou et al.

    Brands as relationship builders in the virtual world: A bibliometric analysis

    Electronic Commerce Research and Applications

    (2020)
  • G. Wang et al.

    Big data analytics in logistics and supply chain management: Certain investigations for research and applications

    International Journal of Production Economics

    (2016)
  • Z. Wang et al.

    An empirical study on business analytics affordances enhancing the management of cloud computing data security

    International Journal of Information Management

    (2020)
  • Agrawal, D., Bernstein, P., Bertino, E., Davidson, S., Dayal, U., Franklin, M., ... & Jagadish, H. V. (2011)....
  • S. Akter et al.

    Big data and disaster management: A systematic review and agenda for future research

    Annals of Operations Research

    (2019)
  • S.E. Antia-Obong et al.

    A Bibliometric Analysis of Journal of Higher Education Management (JHEM) from 2007 to 2016

    Library Philosophy and Practice

    (2019)
  • K. Ashton

    That ‘internet of things’ thing

    RFID Journal

    (2009)
  • J. Baas et al.

    Scopus as a curated, high-quality bibliometric data source for academic research in quantitative science studies

    Quantitative Science Studies

    (2020)
  • S. Batistič et al.

    History, evolution and future of big data and analytics: A bibliometric analysis of its relationship to performance in organizations

    British Journal of Management

    (2019)
  • A. Belhadi et al.

    Understanding the capabilities of Big Data Analytics for manufacturing process: Insights from literature review and multiple case study

    Computers & Industrial Engineering

    (2019)
  • C.M. Bishop

    Pattern recognition and machine learning

    (2006)
  • J.-P. Bonardi et al.

    The attractiveness of political markets: Implications for firm strategy

    Academy of Management Review

    (2005)
  • L. Candela et al.

    The DELOS Digital Library Reference Model

    (2007)
  • H. Chen et al.

    Business intelligence and analytics: From big data to big impact

    MIS Quarterly

    (2012)
  • L.-C. Chen et al.

    Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2017)
  • M. Chen et al.

    Big data: A survey

    Mobile Networks and Applications

    (2014)
  • T.H. Davenport

    Enterprise analytics: Optimize performance, process, and decisions through big data

    (2013)
  • Dean, J., & Ghemawat, S. (2004). MapReduce: Simplified data processing on large...
  • J. Dean et al.

    MapReduce: simplified data processing on large clusters

    Communications of the ACM

    (2008)
  • B.S. dos Santos et al.

    Data Mining and Machine Learning techniques applied to Public Health Problems: A bibliometric analysis from 2009 to 2018

    Computers & Industrial Engineering

    (2019)
  • W.W. Eckerson

    Predictive analytics. Extending the Value of Your Data Warehousing Investment

    TDWI Best Practices Report

    (2007)
  • J.E. Frisk et al.

    Improving the use of analytics and big data by changing the decision-making culture

    Management Decision

    (2017)
  • V.D. Hajje et al.

    Citation analysis of grey literature, reflected in dissertations of library and information science

    International Journal of Library & Information Science

    (2020)
  • K.B. Hansen

    The virtue of simplicity: On machine learning models in algorithmic trading

    Big Data & Society

    (2020)
  • J.P. Hausberg et al.

    Business incubators and accelerators: A co-citation analysis-based, systematic literature review

    The Journal of Technology Transfer

    (2020)
  • Cited by (74)

    • Mapping the landscape of FinTech in banking and finance: A bibliometric review

      2024, Research in International Business and Finance
    View all citing articles on Scopus
    1

    ORCID: 0000-0002-4074-9505.

    2

    ORCID: 0000-0001-7467-5500.

    3

    ORCID: 0000-0002-7526-206X.

    4

    ORCID: 0000-0001-6605-7745.

    View full text