Big data and social media: A scientometrics analysis

,


Introduction
The era of Big Data is underway, computer scientists, physicists, economists, mathematicians, political scientists, bio-informaticists, sociologists, and other scholars are clamoring for access to the massive quantities of information produced by and about people, things, and their interactions (Boyd et al., 2012).Parliamentary office of science and technology in its journal Houses of parliament, number 460 March 2014 write an article and brought some truths about social media and big data: 57% of over-16s in the UK use social media, generating vast amounts of accessible data.Analyzing social media data can help organizations understand behaviors and target products and services more effectively.Key applications include profiling voters and complementing traditional polling, targeting adverts at consumers, credit scoring and informing policing decisions.There is a debate about how to analyze social media data, including which methods to use and how to control for biases.Personal data can be shared or sold with users' consent as long as they are anonymized.There are concerns that users are not fully aware of how their data are being used and that it is often possible to identify individuals from linking anonymized datasets.Analyzing large quantities of readily available data from social media has created new opportunities to understand and influence how people think and act.The rate of unstructured data production on social media makes it difficult to analyze using traditional methods that rely on human analysts.Social media analytics is a new field of study that is developing automated or semi-automated methods for analyzing data.Some advocates of big data argue that the sheer size of the datasets reduces, or even eliminates, the need for established statistical methods such as random sampling, because all the data can be analyzed.However, in the case of social media data, it only contains data about people that use social media.In the UK, around 49% of the population use Facebook and 24% use Twitter and not all users create content.There are concerns that social media data may not represent vulnerable groups in society, such as the elderly or those from lower income backgrounds.This means that there are significant gaps in the data, and there are not yet accepted methods for controlling for biases.
This paper presents an overview on studies associated with big data in social media.The study uses Scopus database as a primary search engine and analyzes the data over the period 2012-2019.In this article we use science mapping technic with Bibliometrix R-package that performing bibliometric analysis and building data matrices for co-citation, coupling, scientific collaboration analysis and coword analysis on topic of use of big data in social media.

Table 1
The main information and summary

About Bibliometrix R-package
Science mapping is complex and confusing because it is multi-step and frequently requires numerous and diverse software tools.Bibliometrix R-package is a tool for quantitative research in scientometrics and bibliometrics.Bibliometrix package provides various routines for importing bibliographic data from Scopus, Clarivate Analytics' Web of Science, PubMed and Cochrane databases, performing bibliometric analysis and building data matrices for co-citation, coupling, scientific collaboration analysis and coword analysis (Aria et al., 2017).

Most cited countries
Our survey demonstrates that United States maintained the most contribution in the field of big data in social media, followed by United Kingdom and China.Table 2 shows details of our survey.

Country Scientific Production
One of the interesting areas of the interest is to learn more about the contribution of different countries in big data in social media.As we can observe from the results of Fig. 2, researchers from USA (1289 papers), China (383 papers), India (305 papers), UK (254 papers) and Australia (175 papers) have contributed the most on big data in social media.

Fig. 2.
The frequency of the keywords used in different big data in social media studies

Highly cited papers (Most Global Cited Documents)
Table 4 shows a summary of the most cited articles.As we can observe from the results of Table 4, the study by Boyd et al. (2012) has received the highest citations.The second highly cited work is associated with Lazer et al. (2014) where they investigated a trap in big data.The third highly cited work belongs to Kramer et al. (2014) where they proposed an important and emerging area of social science research that needs to be approached with sensitivity and with vigilance regarding personal privacy issues.According to Stephens et al. (2015), Genomics is a Big Data science and will become much bigger as time passes on, but we still do not know whether the requirements of genomics will surpass other Big Data domains.Morone and Makse (2015) stated that big data analyses are associated with the set of optimal influencers is much smaller than the one forecasted by previous heuristic centralities.

The most common keywords
Table 5 demonstrates some of the mostly cited references associated with big data in social media.As we can observe from the results of Table 5, big data, social media and social networking (online) are three well recognized keywords used in the literature.Fig. 3 shows the most important words used over times.

Table 5
The most popular keywords used in studies associated with big data in social media    To see the growth and the evolution of this network more tangibly, Fig. 6 shows the same graph over the period 2012-2016 (beginning of the survey until the first significant growth of articles production).

Thematic Map (Well developed or not? Important or not?)
When co-word analysis is used for mapping science, clusters of keywords and their interconnections are obtained.These clusters are considered as themes.Each research theme obtained in this process is characterized by two parameters; namely "density" and "centrality".Both median and mean values for density and centrality can be used in classifying themes in to our groups.In a theme, the keywords and their interconnections draw a network graph, called a "thematic network" that "centrality" is horizontal axis and "density" is vertical axis in it.In a network, if the node has a large amount of relations with others, it has a higher centrality and lies in an essential position in the network.Centrality is therefore used to measure the correction degree among different topics.Similarly, a higher density means higher cohesiveness or equals the higher internal correlation degree among nodes.The density of a research field represents its capability to maintain and develop itself.Thematic map is a very intuitive plot and we can analyze themes according to the quadrant in which they are placed.Upper-right quadrant is motorthemes, lower-right quadrant is basic themes, lower-left quadrant is emerging or disappearing themes, upper-left quadrant is very specialized/niche themes.Themes in the upper-right quadrant are both well developed and important for the structuring of a research field such as "big data" and "big data analytics".Themes in the upper-left quadrant have well developed internal ties but unimportant external ties and so are of only marginal importance for the field such as "social network".Themes in the lower-left quadrant are both "weakly developed and marginal", mainly representing either emerging or disappearing themes such as "social media" and "Hadoop".Themes in the lower-right quadrant are "important for a research field but are not developed", so this quadrant groups transversal and general, basic themes such as "twitter".Thematic analysis shows that for better results we can merge our research focus with "big data analytics" and "twitter" that are important topics in this field but not developed well.Wood (2013) tried to understand which elements of nature influence more on people to locations around the globe, and whether changes in ecosystems could alter visitation rates.Hay (2013), in his research used big data approaches to routinely map all of vast majority of infectious diseases of clinical significance.It would be of public health benefit to map about half of conditions.Research of Crampton ( 2013) presented an overview and initial results of a geoweb analysis designed to provide the foundation for a continued discussion of the potential impacts of 'big data' for the practice of critical human geography.They believed while Haklay's (2012) observation that social media content is generated by a small number of 'outliers' is correct.They could explore alternative methods and conceptual frameworks that might allow for one to overcome the limitations of previous analyses of user-generated geographic information.

Conclusion
This study has tried to provide a comprehensive review of the studies published in the literature associated with big data in social media.The study has indicated that this field has been popular mostly among researchers in USA, China, India, UK and Australia.The study has also indicated that while researchers from USA and UK published a relatively high number of papers, they were also successful to publish highly cited papers.Many big data in social media studies have dealt with combinatorial optimization techniques and our survey has concluded that meta-heuristics methods have been popular among researchers to locate the near-optimal solutions.We hope this study could guide other researchers find important research gaps.

Fig. 1 .
Fig. 1.Word Map collaboration (Social Structure) As we can observe from the results of Fig. 1, there were strong collaboration from the researchers in United States from one side and other countries as shown in below: According toBello- Orgaz et al. (2016) big data plays an essential role for a large number of research areas such as data mining, machine learning, computational intelligence, information fusion, the semantic Web, and social networks.The rise of various big data structures such as Apache Hadoop and, more recently, Spark, for huge data processing has provided an opportunity for an efficient utilization of data mining techniques and machine learning methods in various domains.Bello-Orgaz et al. (2016) provided a revision of the new techniques designed to help for active data mining and information fusion from social media and of the new applications and frameworks which are presently are available under the "umbrella" of the social networks, social media and big data paradigms.Mohr et al. (2013) concentrated on the barriers and the costs associated with big data storage and specified that any improvements in the collection, storage, analysis and visualization of big data could help practitioner better target sales.

Fig. 3 .
Fig. 3.The frequency of the keywords used in different big data in social media7.Word DynamicsWord dynamic graph prepared on keywords helps us learn more about the keyword dynamics over time.Their growing or declining trend can help us choose a better topic in any survey.There are two types of keywords: Author keywords and Keywords plus.Author keywords are the ones that authors state in their articles and keyword plus are the results of the Thomson Reuters editorial expertise in science.What they do is to review the titles of all references and highlight additional relevant but overlooked keywords that were not listed by the authors or publishers.With keywords plus, it is possible to uncover more papers that may not have appeared in a search due to changes in scientific keywords over time.

Fig. 4 .
Fig. 4. Keywords plus dynamic view over timeAs we can observe from the results of Fig.4, big data, social media, social network (online) and data mining, show good growth in the chart unlike sentiment analysis and internet.8.Conceptual structure, Co-occurrence networkA keywords co-occurrence network (KCN) focuses on understanding the knowledge components and knowledge structure of a scientific/technical field by examining the links between keywords in the literature.Fig.5focuses on the analysis methods based on KCNs, which have been used in theoretical and empirical studies to explore research topics and their relationships in selecting scientific fields.If keywords are grouped into the same cluster, they are more likely to reflect identical topics.Each cluster has different number of subject keyword.

Fig. 7 .
Fig. 7. Thematic Map 10.Intellectual Structure, HistoriographThe historiographic map is a graph proposed by Garfield to represent a chronological network map of the most relevant direct citations resulting from a bibliographic collection.The citation network technique provides the scholar with a new modus operandi which may significantly affect future historiography.

Fig. 8 .
Fig. 8. HistoriographFig.8showsBoyd (2012),Wood (2013),Hay (2013) andCrampton (2013) were the beginner of new trends at their own time.The direction of the arrows in Fig.8explains the chronicle change of research trends from the past.Research accomplished byBoyd (2012) was about the effects of big data on knowledge.Crampton (2013),Kramer (2014), Hassan (2014),Shelton (2015) and Vatrapu (2016) provided more development on big data.Wood (2013) tried to understand which elements of nature influence more on people to locations around the globe, and whether changes in ecosystems could alter visitation rates.Hay (2013), in his research used big data approaches to routinely map all of vast majority of infectious diseases of clinical significance.It would be of public health benefit to map about half of conditions.Research of Crampton (2013) presented an overview and initial results of a geoweb analysis designed to provide the foundation for a continued discussion of the potential impacts of 'big data' for the practice of critical human geography.They believed while Haklay's (2012) observation that social media content is generated by a small number of 'outliers' is correct.They could explore alternative methods and conceptual frameworks that might allow for one to overcome the limitations of previous analyses of user-generated geographic information.

Table 2
The summary of the contributions of different countries:

Table 3
Country collaboration Table

Table 4
The summary of the most cited articles