What ’ s on the horizon? A bibliometric analysis of personal data collection methods on social networks

In Digital Marketing, the capture of consumers ’ personal data from social networks is essential for better targeting of commercial actions. The methods for collecting this information are arousing growing interest among the scientific community. This paper offers a comprehensive review of the literature on the issue and its management. To this end, a bibliometric study of 866 publications on the Web of Science between 1997 and 2022 was conducted to identify the most relevant trends through analysis of the most significant articles, keywords, authors, institutions and countries. In addition, visualisation software ( VOS ) was used to illustrate the relationships established through bibliographic coupling, keyword co-occurrence, authors and co-citation. The results indicate that the USA and Australia are the countries that publish the most in this field, while Finland and Australia have the highest number of publications per capita. Finally, the progress of research is discussed and future research directions are suggested.


Introduction
Social media (SM) is defined by Kaplan and Haenlein (2010, p.61) as "a group of Internet-based applications that build on the ideological and technological foundations of Web 2.0 and enable the creation and sharing of user-generated content."This easy way of exchanging content between people from different countries and conditions has led to the generation of a massive amount of information (Kapoor et al., 2018).Undoubtedly, it offers an enabling environment with massive scope for value propositions, and a multitude of possibilities for digital marketing to promote and sell its products and services, as well as generate a new form of consumer behaviour.Simply by means of a SM platform, millions of users can be reached, who may become present and/or future consumers (Kumar et al., 2016).Consumer data can be captured on social media through digital marketing actions such as posting ads, running contests or quizzes, or soliciting support for social causes, among others (Desai, 2019).All these forms have increased tremendously in number in recent years (Statista, 2021).Once the information has been captured, today's technology allows for highly sophisticated analysis and very reliable predictions (De Luca et al., 2021;Dwivedi et al., 2021;Grewal et al., 2021;Grishikashvili et al., 2014, Kharchenko, 2019;Wang & Wang, 2020).Digital marketing is one of the areas that has benefited the most from these techniques due to their effectiveness and usefulness for optimising strategies and actions in this area (Choudrie et al., 2021;Lies, 2019).
Several studies from different points of view have reviewed the literature on digital marketing via social media (e.g.Alalwan et al., 2017;Ghorbani et al., 2021;Ismagilova et al., 2017;Kapoor et al., 2018;Krishen et al., 2021;Plume et al., 2016).While Kapoor et al. (2018), for instance, focus very specifically on just eight journals in the field of Information Systems, others, such as Krishen et al. (2021) and Ghorbani et al. (2021), take a broader, more multidisciplinary approach to digital marketing.While these studies provide valid information, for example by noting that digital marketing research steadily became more widespread throughout their studied periods, the discipline is evolving so exponentially that a bibliometric approach is needed that focuses not only on capturing, aggregating and synthesising the most up-to-date information (Paul & Criado, 2020;Randhawa et al., 2016) but that also adopts a narrower focus that can add more value to studies in the field (Kapoor et al., 2018).
The aim of this study is to review the literature by focusing on articles that have addressed data management within digital marketing and social media, based on the premise that a narrower spectrum enhances the value of the resulting information for the purposes of research in the field (Kapoor et al., 2018).As for the database, although some authors claim that SCOPUS offers superior coverage of the social sciences to that of Web of Science (WoS) (Chadegani et al., 2013), in this study we have opted for WoS following the recommendation of Kapoor et al. (2018).WoS encompasses an extensive collection of bibliographic databases and references to scientific publications that can be used to analyse the scientific performance and quality of research (Paul & Singh, 2017).The Web of Science Core Collection (WoS CC) is a database included in WOS that contains comprehensive bibliographic references, citation indexes and h-indexes of authors from different disciplines, including the one covered in this study.Among other tasks, this database can be used to extract detailed information on the total number of published articles, citations, h-index and the citation thresholds and citations per article (Paul & Criado, 2020).
In addition, a new bibliometric approach (Goyal & Kumar, 2021) based on indicators and using the Visualizing Scientific Landscape (VOS viewer) software was used to analyse WoS CC publications.The findings present information on the most cited papers and the main authors, institutions, countries and keywords in this field.Moreover, the different fields analysed are presented graphically with images obtained from the VOS viewer in order to illustrate the relationships between them.Based on the mapping analysis method proposed by Merigó et al. (2018) we assess co-citation (Small, 1973), bibliographic linkage (Kessler, 1963), keyword co-occurrence and co-authorship (Merigó et al., 2016).
The remainder of the paper is organised as follows.Section 2 presents the theoretical framework.Section 3 summarises the bibliometric methods.The results obtained from WoS CC are presented in section 4. Section 5 presents a graphic map of the bibliographic material produced with VOS viewer.Finally, Section 6 discusses the main findings and Section 7 summarises the main conclusions, limitations and future lines of research.

Theoretical framework
The scope of social networks and their potential to facilitate relationships between subjects from different backgrounds has led to a tenacious social structure (Kapoor et al., 2018).This is fertile ground for companies to build more effective relationships with their consumers due to the increased ease and speed of access, the ability to establish conversation and, consequently, greater exchange of information (Ghorbani, et al., 2021).However, all these attributes generate such an enormous volume of information that digital marketers can barely handle it, and this prevents them from targeting their promotions accurately.Conversely, it can lead to mass promotions that generate social overload with negative psychological and behavioural consequences (Kapoor et al., 2018).
In order to mitigate the negative consequences of unfocused digital marketing, companies that gather marketing data from social media need to know what data they should collect, and for what purpose.For example, different kinds of data are required to sell products and services, to raise awareness of a brand, to generate traffic to online platforms, or to foster engagement with companies and users on social media (Bianchi & Andrews, 2015;Schultz & Peltier, 2013).Schweidel and Moe (2014) point out that access to personal data on potential customers means companies gain greater control over the performance of their social media campaigns and improves the segmentation of their target audience.However, in order for the information to generate value for the promotion of their products and services, companies need this data to be as reliable as possible.There is evidence that the availability of quality data helps to build brand loyalty among concumers through engagement and sharing (Menon et al., 2019).Clearly, the veracity of the data that consumers are willing to share will depend on the how much consumers trust the companies and brands that request such information from them in the social media sphere.If consumers perceive companies and brands as untrustworthy (Fournier & Avery, 2011) or intrusive (Schultz & Peltier, 2013), this might encourage them to supply false information.In contrast, other studies such as Ashley and Tuten (2015) and Canhoto and Clark (2013) show that users want companies to be present on social media, and quote or tag brands in their posts, thus willingly offering their data.This discrepancy generates a duality among consumers, with some wanting brands to be active on social media, and others rejecting such practices.
This article reviews the literature on the collection of consumers' personal data from social media for digital marketing purposes.It does so by applying methods derived from bibliometrics and content analysis to assess the current state of the literature in an objective and quantifiable manner.The study analyses the volume of publications, journals, impact factors, most cited articles and authors, and the most prolific countries in order to identify the main current trends and future lines of research on the topic.

Methods
The bibliometric analysis was conducted in two phases: (1) an exploratory search based on the intersection of keywords and (2) manual screening to only select articles dealing with consumers.In the abundant literature on such methods (e.g.Broadus,1987;Goyal & Kumar, 2021), bibliometrics is defined as a field of library and information science that studies bibliographic material using quantitative methods to synthesise the vast amount of information and thus generate evidence of themes and trends that will help researchers in the field to better focus their lines of research (Gaviria-Marin et al. 2018).Although it is not a new methodology (Subramanyam, 1983), having existed in academia for more than a quarter of a century (Ding et al., 2014), its use is booming among the scientific community because of improvements in applied technology (Goyal & Kumar, 2021;Ruggeri et al., 2019;Dao et al.,2017).
Bibliometric analysis can contain many indicators that are accepted by academia (Ruggeri et al., 2019;Cancino et al., 2017;Randhawa et al., 2016;Merigó et al., 2015).These include indicators of the total number of publications (Yi & Yang, 2014;Yu et al., 2018), the number of citations received (Radicchi et al., 2008;Valenzuela et al., 2017), h-index (Costas & Bordons, 2007;Hirsch, 2005), and so on.It is also possible to analyse interactions between two variables and determine the degree of citation between two publications (Wang et al., 2018).When two different publications both cite a third publication, this is called bibliographic coupling (Kessler, 1963), and when two different publications are cited by the same publication, this is called co-citation (Small, 1973).The keyword index, which usually appears below the abstract, is measured through the co-occurrence of keywords.Network graphs are often used to visualise all these relationships (Laengle et al., 2018).In general, most studies try to cover as many indicators as possible to thus obtain a more complete and holistic picture of the results (Laengle et al., 2018;Tur-Porcar et al., 2018;Laengle et al., 2017).
This study narrows its focus to consideration only of articles that address data management within digital marketing and social media.For this purpose, the Web of Science Core Collection database was scanned for the period of the last 25 years (between January 1997 and April 2022).In the first phase, we searched for papers that matched the intersection of three keywords: "data*" AND "social media*" AND "digital marketing*".In the second phase, all documents were screened by selecting only those published in indexed journals, and from there, titles and abstracts were reviewed manually, selecting only those dealing with consumer-related topics, resulting in a total of 866 documents.To illustrate the links and interactions between the indicators, we followed the recommendation by Van Eck and Waltman (2010) that VOS software should be used to improve the visualisation of the results of this type of study.We also followed the recommendation by Merigó et al. (2018) on the use of VOS to display the bibliographic coupling of countries and to plot the co-citation results for authors and journals.The L. Sáez-Ortuño et al. complete search and analysis procedure was conducted between February and March 2022.
Fig. 1 shows the evolution of the number of articles per year, and reveals a markedly exponential trend.Meanwhile, Table 1 presents an annual overview, showing that the 866 published documents received 17,243 citations.

Results
This section presents the results of the bibliometric analysis of the sources that addressed data management issues within digital marketing and social media.It begins by presenting the publication and citation structure of the articles, and then describes the most influential articles and the top journals, followed by the main authors, institutions and countries.

Publication and citation structure of the collection of consumers' personal data from social networks
A synthetic analysis is provided in order to better understand the situation of each global supra-region and region in terms of scientific contributions to this area of knowledge over the study period.An initial overview of the journals with the highest number of publications and citations is shown.Table 1 shows that the first publication in this area appeared in 1997 and, from then on, we can observe a constant, exponential increase in the number of publications.2012 had the highest number of citations, 3,337, although the highest number of publications was in 2021, with 230 original contributions.The table also presents data on studies that received more than 500, 200, 100, 50, 20, 10, 5 and 1 citations.Of all the articles published in these 25 years, only 1.27 % received 200 or more citations.3.12 % of them received 100 or more, while 68.71 % of the articles received at least one citation.
Table 2 lists the twenty journals that have published the highest number of articles on the collection of consumers' personal data from social networks.The journal Sustainability published the highest absolute number of articles in this field, and also the highest absolute number of articles on all topics (14,030 documents in 2021), although none of its articles are among the 25 with the highest number of citations.In contrast, the Journal of Business Research, which has the second highest number of articles published in this category, and 985 articles published in absolute terms, does have three of the 25 most cited articles in this category.The highest proportion of citations per article per year in this area came from Information Communication & Society, with 234 citations/year.Table 3 shows the thirty most cited articles in this field of study, three of which were published by the Journal of Medical Internet Research.The most cited is on "critical issues for big data", while the second most cited article is on "smart cities of the future".

Journals with most cited articles on the capture of consumers' personal data from social media
This part of the study presents the 30 articles that have generated the greatest impact in terms of the total number of citations.The list is shown in Table 3, which also provides information on the title of the paper, the name(s) of the author(s), name of the journal, year of publication and the citations received per year.As shown in this table, one article clearly tops the ranking, which is Boyd & Crawford (2012), who propose an in-depth analysis of critical issues for big data.This paper has received, 2,341 citations in total, and 234.10 citations per year.It is followed by the paper by Batty et al. (2012), which illustrates how new technologies and the information they generate can contribute to the design and improvement of new cities, with 817 citations received and 81.7 per year.In third place is Day (2011), who raises the issue of the growing gap between the accelerating complexity of the new markets opened up by the internet and social networks, and the limited capacity of their organisations to respond, with 427 citations and 38.8 per year.It is also important to note the heterogeneity among the authors of the 30 most cited contributions.

Main authors, institutions and countries of publications on the capture of consumers' personal data from social networks
This part of the research presents a descriptive analysis of the most active authors, the institutions where they are accredited and the countries of origin of those institutions.Summaries of this information are presented in tables 4, 5, 6 and 7. Table 4 shows the top 30 most active authors with respect to the number of publications they are involved in.When two or more authors have the same number of publications, the author with the highest number of citations is ranked higher.Additional information such as h-index, author affiliation and country of residence is also presented.According to these results, Kelly of Wollongong University in Australia, Karjaluoto of Jyvaskylay University in Finland and Freeman of Sydney University in Australia are the three most prolific authors in terms of number of publications, and Crawford of Microsoft Res, Cambridge in the USA is the leader in terms of total citations, with 2,429.
Finally, this study describes the most productive and influential countries in terms of the number of publications.Table 5 shows the ranking of the 30 most active countries, the supra-regions to which they belong, and data related to total publications, citations, H-index, and  population, as well as certain ratios between them.For example, the final item in this table is the proportion of each region among the 30 most cited articles, according to which the countries contributing the highest number of publications are the USA, the UK, China, Australia and Spain.However, although the USA tops the list, both in number of publications and number of citations, Australia, UK, Canada and Italy are, in that order, the top countries in terms of citations received.Interestingly, China, publishing 56 papers, has received 738 citations, while Australia, publishing 54, has received 1,734 citations.However, the indicator of total publications per capita offers a more homogeneous perspective, as it is presented in fair proportion to population size.According to this analysis, Finland and Australia have the highest number of total publications per capita, while countries such as India and Pakistan, which rank 6th and 19th respectively, have one of the lowest total publication rates per capita, at 0.02 and 0.04.A similar case is Indonesia, which ranks 23rd in terms of total publications but only publishes 0.03 articles per member of its population.
Table 6 shows the ranking of the most productive and influential institutions with respect to all bibliographic indicators: published papers, citations, H-index, citation-to-publication ratio, articles with>50, 25 or 5 citations and also the position of these institutions based on two global rankings, the Academic Ranking of World Universities and the Quacquarelli Symonds University Ranking (ARWU and QS).The final column of this table shows the number of articles that each institution has among the 30 most cited.The University of Sydney tops this list, followed by the University of Jyvaskyla and the University of North Carolina.Note that 5 of the top 10 universities are in the USA, which is in line with the results shown in Table 5.However, the most cited author in this field, Crawford, is not from any of the top 10 institutions.Meanwhile, the aggregate results for each sub-region are shown in Table 7, which illustrates the trend in publications by geographical region of the countries for total publications, total citations, h-index and the ratio of   total citations to total publications.

Mapping results with VOS visualisation software
In this section, the VOS viewer software (Van Eck & Waltman, 2010) is used to graphically display the bibliographic coupling of countries and institutions, citations per university, co-citation of authors and journals, and co-occurrence of keywords defined by authors, as well as those extracted from article titles and abstracts.Fig. 2 shows the co-citation of journals in the field of this study with a threshold of 10 and the 100 most representative co-citation connections.Fig. 3 shows the co-citation of authors in the field of capture of consumers' personal data from social networks with a threshold of 10 and the 100 most representative cocitation connections.These results are in line with the previous results obtained from the analysis of authors in Table 4.The different clusters are shown in different colours, and the links between them are also indicated.Another item that was analysed using this software is the bibliographic coupling of institutions with a threshold of at least 3 publications and showing the 100 most representative connections.Fig. 4 shows the results obtained, according to which and shown in Table 6, the University of Sydney and the University of Jyvaskyla are the most prominent institutions.The other universities in this ranking are mainly from the United States.The following analysis performed with the VOS viewer software refers to the co-occurrence of author keywords and provides a complete understanding of the main keywords used in articles on the capture of consumers' personal data from social networks in the period of this study.As in the previous cases, this graph is in line with the results presented above.Fig. 5 shows the 100 strongest Abbreviations: C/Y = Citations per year.TC = total citations.Abbreviations: TP = total papers; TC = total citations; H = h-index; TC/TP = ratio of citations divided by publications.
connections, with a threshold of five documents.As shown in the graph, the most common words are 'Social Media', 'Big Data' and 'Digital Marketing'.Fig. 6 shows the 100 strongest connections, with a threshold of five documents.As can be seen, the expressions 'Social Media', 'Impact' 'Big Data' and 'Word of Mouth' are the most prominent.
Another element that was analysed using this software is the Abbreviations: TP = total papers; TC = total citations; H = h-index; TC/TP = ratio of citations divided by publications; Population = thousands of inhabitants; TP/POP = Total papers per million inhabitants; TOP 30 = the 30 most cited papers.

Table 6
The most productive and influential institutions.bibliographic coupling of countries, with a threshold of at least 3 countries and showing the 100 most representative connections.Fig. 7 shows the results where, as already seen in Table 5, the USA, the UK, China, Australia and Spain are the most prominent countries.

Discussion
The results show that the scientific literature on the capture of consumer data from social media for digital marketing purposes has increased exponentially over the last 25 years.This study not only identified the most productive authors, institutions, journals and countries, but also ran a traditional citation count using the VOS bibliometric mapping software tool.
The literature in the early years of research on this topic was heavily influenced by debates around data collection techniques (Cln, 2013), while the role of data collection ethics, as well as misinformation and misreporting, has become increasingly relevant since then, especially Abbreviations: H = h-index; TC/TP = ratio of citations divided by publications.regarding more sensitive personal data such as that related to health (Nunan & Di Domenico, 2013;Dulhanty, 2021).Large corporations are also taking an interest in this area, with an increasing number of articles dealing with more practical issues.The results also suggest that the capture of consumers' personal data from social networks has only been partially explored and still has a long way to go as technology evolves.
As is often the case with new topics, the first articles to be published were more generalist and mainly referred to data gathering techniques, as well as the type of data that corporations were allowed to collect and use, while more recent contributions have tended to be much more critical and detailed in their analysis.The authors agree on the need to take greater account of more ethical aspects, but not to the detriment of the fact that proper data collection can help to improve the overall economy (Roche & Jamal, 2021).Indeed, studies confirm that proper data collection can greatly support objectives and strategies in the field of digital marketing, as well as improve the conditions and results of less skilled workers at, for example, call centres (Ogilvie et al., 2017).Digital marketing should address all these issues of data capture through reorganisation of the long-term strategy to capture data in an ethical manner and, above all, to avoid misinformation (Di Domenico & Visentin, 2020).

Conclusions, limitations and future lines of research
This study, given the exponential growth in the number of publications on social media, has proposed a narrower focus only on publications that have considered data management within digital marketing and social media.We believe, in line with Kapoor et al. (2018), that a narrower, more synthesised scope can increase the value of the information to researchers in the field.A total of 866 publications were extracted and subsequently analysed from the Web of Science Core Collection, covering a period of twenty-five years from 1997 to 2022.The bibliometric analysis also reveals an exponential increase in the number of publications and citations in this particular area during the period analysed.
This research contributes to the existing scientific literature on the  collection of consumers' personal data from social networks.The main authors and journals have been identified, with two of the three authors with the most publications and who are most cited being attached to Australian institutions, while the other is attached to an institution in Finland.
The five most prolific countries in terms of publications are, in this order, the USA, the UK, China, Australia and Spain.Another relevant indicator is the number of publications per capita in each country, with Finland topping this ranking, followed by Australia.
The VOS viewer software was used to support the evidence obtained from the Web of Science Core Collection by mapping and graphically describing the results for the co-citation of journals and authors, the bibliographic coupling of countries and universities, and the cooccurrence of authors' keywords, thus making it visually easier to understand the relationship between the variables.
The tables in this study are based on various bibliometric measures to help readers to understand the trends in publications on the capture of consumers' personal data from social networks.This study is also useful to understand what lines of research have been pursued and what research remains to be done.Despite the growing interest in this area of study, there are many lines of research that have yet to be explored.
An attractive option for future research would be to perform the same analysis for the same period using different methods (g-index, pindex, article influence score, etc.) to make comparisons possible between the results of two papers.
There is also little literature on consumers' motivations for entering personal data on social networks in the context of digital marketing, which could be a promising line of research.It would also be interesting to analyse the generational and gender profiles of users who provide their personal data on social media.The existing literature also seems to make the general assumption that users always enter truthful data.There is very little literature analysing fraudulent consumer data and its direct effects on digital marketing.A future line of research would be to analyse this phenomenon, detecting the profiles of these consumers, as well as their motivations for not providing truthful information.
Another interesting line of research would be to perform a cluster analysis of the users who enter their personal data in social media.Although authors on this topic offer keywords that can be clustered, no evidence has been found of clusters of consumers who disclose their personal information on social media within the framework of digital marketing.Analysis of these clusters would provide relevant information for a more detailed understanding of the consumer.Companies would be able to offer products and services that are much more aligned with their interests, increasing the level of satisfaction among both parties.
Regarding keyword analysis, an interesting future line of research would be to analyse the means used to capture personal data from social networks, and it would also be useful to know what kinds of rewards and incentives most motivate consumers to give up their personal data.
This article is not without its limitations.There are restrictions related to the method used, the database and the choice of data.With regard to the method, only publications that correspond to the intersection of three key words ("data*" AND "social media*" AND "digital marketing*") were considered.Other combinations would have resulted in different lists of publications.In terms of the database, the clearest limitation lies in having restricted the bibliometric analysis to the Web of Science Core Collection.However, this was intentional as such an indepth analysis would not have been possible with a combination of various databases.It should be added that the limitations in term of scope that apply to the Web of Science Core Collection also apply to this research.With regard to the choice of data, this was restricted to articles published in journals, discarding other sources such as books, conference papers, etc.Therefore, if more sources had been considered, the results would have been different.However, on this last point, the prevailing idea was to aggregate information as homogeneously as possible.
An interesting future line of research on the capture of personal data from social media would be to include a larger number of academic databases, such as Scopus and Google Scholar, to establish even more complete classifications of journals, academics, academic institutions and countries (Koberg & Longoni, 2018) and hence extrapolate the results to the full range of publications on this topic in relation to digital marketing.

Table
The 30 most cited documents.

Table 4
Top 30 leading authors.

Table 5
The most productive and influential countries.
Abbreviations are available in previous tables except for: ARWU and QS = Academic Ranking of World Universities and QS University Ranking.

Table 7
Publications by supranational regions.