Data mining-based recommendation system using social networks—an analytical study

In the current age, social media is commonly used and shares enormous data. However, a huge amount of data makes it difficult to deal with. It requires a lot of storage and processing time. The content produced by social media needs to be stored efficiently by using data mining methods for providing suitable recommendations. The goal of the study is to perform a systematic literature review (SLR) which finds, analyzes, and evaluates studies that relate to data mining-based recommendation systems using social networks (DRSN) from 2011 to 2021 and open up a path for scientific investigations to enhance the development of recommendation systems in a social network. The SLR follows Kitchenhem’s methodology for planning, guiding, and reporting the review. A systematic study selection procedure results in 42 studies that are analyzed in this article. The selected articles are examined on the base of four research questions. The research questions focus on publication venues, and chronological, and geographical distribution in DRSN. It also deals with approaches used to formulate DRSN, along with the dataset, size of the dataset, and evaluation metrics that validate the result of the selected study. Lastly, the limitations of the 42 studies are discussed. As a result, most articles published in 2018 acquired 21% of 42 articles, Whereas, China contributes 40% in this domain by comparing to other countries. Furthermore, 61% of articles are published in IEEE. Moreover, approximately 21% (nine out of 42 studies) use collaborative filtering for providing recommendations. Furthermore, the Twitter data set is common in that 19% of all other data sets are used, and precision and recall both cover 28% of selected articles for providing recommendations in social networks. The limitations show a need for a hybrid model that concatenates different algorithms and methods for providing recommendations. The study concludes that hybrid models may help to provide suitable recommendations on social media using data mining rules.


INTRODUCTION
In the last two decades, social media has become popular and affordable for its users, making this world a global village. Social media is a digital media platform that users use for communicating and sharing content (Ilyas & Alharbi, 2022). The content can be personal activities, thoughts, pictures, news, ideas, debates, and government guidelines (Pang & Lee, 2008). Currently, Facebook, Twitter, Instagram, LinkedIn, YouTube, etc. are commonly used social media platforms (Kent, 2010;Howard & Parks, 2012) that produce data in linguistics (Kanwal, Qadir & Shaukat, 2021). The list of social networks is shown in Table  1. The social media platforms used in Table 1 provide facilities for online interaction, formation of social networks, and collaboration among users (Carr & Hayes, 2015). These platforms will be used in the future for collecting data and to perform mining in order to provide recommendations. Social media plays a significant role in sharing content online and formulating online social networks (Esmaeili & Vancheri, 2010). Therefore, social media is a group of applications that works over the Internet to exchange the content generated by users. The online produced content needs to be stored at some repositories to be utilized purposefully. Data mining is required to deal with this large, complex, and frequent data. Otherwise it is hard to extract meaningful information from social media (Hu & Liu, 2012;Gundecha & Liu, 2012). The data is represented in the form of nodes and edges during data mining; hence the nodes are considered users, and their interconnection is represented as links that formulate a graph-like structure (Chen et al., 2022). Data mining methods are used for information retrieval (Pang & Lee, 2008), statistical modeling (Koukouvinos, Mylona & Parpoula, 2013), and machine learning (Boiy & Moens, 2009;Alinejad-Rokny, Sadroddiny & Scaria, 2018). These methods firstly pre-process data (Baskar, Arockiam & Charles, 2013), then apply rules to analyze the data (Azzalini & Scarpa, 2012), and interpret data to obtain results (Smith et al., 2006). The result is represented in the form of graphs (Aggarwal, 2011). Data mining uses an unsupervised classification of data using semi-supervised classification and supervised classification (Simovici, 2015). Association rule mining is a type of data mining that helps to evaluate the data in a dataset that look for similarities or co-occurrences in data set  which make it easy to extract opinion from data (Shaukat et al., 2020a). After classification of data, the recommendation system (RS) (Nagarnaik & Thomas, 2015;Sharma & Gera, 2013) uses data mining rules for providing recommendations using state of the art recommendation methods; collaborative filtering (CF), content-based filtering (CBF) or hybrid model (Thorat, Goudar & Barve, 2015). It also uses data mining methods to provide recommendation on social network (Najafabadi, Mohamed & Mahrin, 2019;Faryal et al., 2015).
The methods and techniques used for data mining based RS using social networks (DRSN) are large in number (Najafabadi, Mohamed & Mahrin, 2019;Faryal et al., 2015). It is difficult to identify which methods can provide recommendations by using social websites. Furthermore, different online platforms are available; therefore, every study uses data depending upon its requirement and uses evaluation methods accordingly. The overall research trend is also moving towards the data mining rule to provide recommendations, but it is still unambiguous (Vairavasundaram et al., 2015). A survey (Najafabadi, Mohamed & Mahrin, 2019;Anandhan et al., 2018); focuses on data mining methods in recommend systems. Another survey emphasizes data mining methods for social network analysis (Adedoyin-Olowe, Gaber & Stahl, 2013;Injadat, Salo & Nassif, 2016). Furthermore, there  (Alrashidi et al., 2021;Camacho & Alves-Souza, 2018) and surveys (Eirinaki et al., 2018;Javed et al., 2021) that focuses in the domain of RS. The existing survey (Najafabadi, Mohamed & Mahrin, 2019) deals with DRSN. However, it only highlights the studies that use CF methodology whereas (Anandhan et al., 2018) focuses on both data mining and RS methodologies, but it has explained them independently without linking their working. A comparison of existing studies in the domain of DRSN is exhibited in Table  2. It is extracted from Table 2 that there is scope for SLR in the domain of DRSN. So, there is a need for an SLR that focuses on all the domains, methodologies, and techniques used by different studies. This article contributes to propose a novel SLR that focuses on data mining rules on the social network for providing recommendations. The SLR deals with four research questions focusing on publication channels, type of research, chronological distribution, and geographical distribution. It also identifies proposed methods and most frequently algorithms that can provide better performance. Moreover, it focuses on the study of datasets, size of dataset and evaluation metrics used in the domain of DRNS. Lastly, limitations of existing work are discussed because the existing literature does not focus on all stated parameters on a single platform. Therefore, it is necessary to explore the details of DRSN. The research articles are selected by study selection procedure which uses inclusion and exclusion criteria. It includes peer-reviewed articles. Moreover, this SLR only focuses on articles that are published from 2011 to 2021. Furthermore, it does not focus on articles not listed in the Journal Citation Report (JCR) list (Aloui, 2021). Likewise, the articles with no evaluation criteria or novel contribution are excluded. In case of some enhancement in the already published article, the only latest article will be considered.
Section 2 focuses on the literature review; Section 3 defines the research protocol; Section 4 answers the research questions; Section 5 shows the taxonomy based on the domain of DRSN and limitations of SLR; and Section 6 deals with general observation about SLR and future directions; Section 7 concludes the SLR.

LITERATURE REVIEW
There are many surveys and a few SLR in the field of DRSN are discussed in this section, but some focuses on RS while other focuses on data mining rules on social media dataset. However, two surveys focus on both data mining and RS but they comprise outdated data and miss several aspects. The literature can be divided into three sections which are mentioned below.

Data mining techniques on social network
This section discusses the surveys that gather data about the data mining techniques to store and manage data coming from social networks. A survey article (Adedoyin-Olowe, Gaber & Stahl, 2013) focused on data mining techniques that are used in social network analysis. It focuses on pre-processing methodologies, data analysis, and data interpretation methodologies used in data mining on Web 2.0 technologies. This article limits its scope to data mining methods. It does not focus on the novel state-of-the-art methodologies in data mining. Also, it only shows the methodologies and tools used for data mining, but it does not work on where the mined data will be utilized afterward.
Furthermore, a survey (Injadat, Salo & Nassif, 2016) works on the methods of data mining on social media networks focuses on studies from 2003 to 2015. It selected 66 research articles, and results show that there are 19 techniques used for data mining in collaboration with social media content till that time. This study primarily focuses on nine research objectives in different domains, but the data mining methods are still raw and ambiguous in social media. This article focuses on data mining methods used in social media, its research areas, their incorporation with machine learning methodologies and comparison among data mining methods. It also mention the strengths, and weaknesses of using data mining on social media datasets. This article only focuses on publication venues along with chronological distribution. Rather, it focuses on machine learning-based ideas and methods to incorporate with social media. Furthermore, it does not focus on recommendation methods and data mining on social media data to make it efficient.

RS methods on social network
This section discusses the 2 SLRs and 1 survey article that emphasizes on methodologies of RS while providing recommendations.
An SLR (Alrashidi et al., 2021) focuses on social networks and provides recommendations accordingly. It focuses on 32 articles from 2016 to 2020. Furthermore, it has used five digital libraries: Scopus, IEEEXplore, Springer, ScienceDirect, ACM, and Web of Science for data extraction. It also states that hybrid models in recommend systems result in high accuracy. It uses deep learning for emerging social recommend systems. This article only focuses on deep learning methodologies used for recommendation; however, this does not focus on data mining methodologies such as association, clustering, and anomaly analysis (Azzalini & Scarpa, 2012) explicitly for storing and managing large data produced by social websites. Moreover, potential issues or limitations are not discussed which to improve the system.
Another, SLR (Camacho & Alves-Souza, 2018) deals with review upon cold start problem (Lam et al., 2008) in which there is no information available in system for providing recommendations. This work focuses on studies from 2011 to 2017. This SLR classifies the possible solutions to eliminate social network issues in recommend systems. The scope is limited to dealing with the cold start problem rather than the complete domain of recommended systems. It deals with a specific type of article.
A survey article  focuses on content-based along with context-based RS by considering CBF, CF, and hybrid models. It considers the articles from 2000 to 2015. It shows the year-wise distribution of articles related to RS. It also shows the classification of methodologies used by RS. This article generally focuses on methods and techniques used by RS rather than focusing specifically on social networks and ways of mining the data.
For a large scale, social network data recommend systems are used. A survey based on large-scale social network data (Eirinaki et al., 2018) spotlights different recommendation methods. It also focuses on challenges faced by large-scale RS, such as data variety, volume, and volatility. Moreover, it discusses special issues articles in this domain. It emphasizes context-aware and simple item RS. Nevertheless, this does not focus on session-based RS using neural networks or any other data mining method that can be used to deal with huge data.

Data mining rules along with RS
This section discusses the surveys that focus on both data mining and RS while providing recommendations.
The survey is concentrated on data mining methodologies in RS (Najafabadi, Mohamed & Mahrin, 2019). This article extracted 36 articles from IEEEXplore, ACM, Sage, and ScienceDirect. However, the scope of this article is limited to CF in RS. It does not examine other methodologies used in RS, such as content-based filtering and hybrid modeling. Likewise, an article emphasizes recommend systems using social media (Anandhan et al., 2018). Its scope is limited to two digital libraries and incorporates articles from 2011 to 2015. Furthermore, it only incorporates limited keywords for research such as ''recommend systems,'' ''forums, '' or ''forum,'' ''social network'' or ''social networks,'' or ''web'' or ''social bookmarking'' ''blogs'' or ''blog .'' It does not incorporate ''data mining'' or ''social website.'' The articles (Adedoyin-Olowe, Gaber & Stahl, 2013;Injadat, Salo & Nassif, 2016) are surveying those centers on data mining methodologies in the social network. Whereas, Alrashidi et al. (2021) and Camacho & Alves-Souza (2018) are SLRs on RS, Eirinaki et al. (2018) is survey in the field of RS using social websites and Javed et al. (2021) is survey only discusses RS. Lastly, Najafabadi, Mohamed & Mahrin (2019) and Anandhan et al. (2018) focus on both data mining rules in RS using social media data, but they are survey studies. Table 2 shows the comparison of studies and methodologies they have adopted for review. We have not found an SLR in the domain of DRSN dealing with JCR-listed articles ranging from 2011 to 2021. The data mining and recommendation methodologies can be used in conjunction with each other that complement the results. Thus, there is a need for SLR that focuses on the domain of DRSN. Table 2 shows that two studies work on the DRSN domain, but both are survey articles (Najafabadi, Mohamed & Mahrin, 2019;Anandhan et al., 2018). One of the articles only focuses on CF-based methodologies (Najafabadi, Mohamed & Mahrin, 2019), and other articles have not shown combined usage of data mining and RS on social networks (Anandhan et al., 2018). The articles (Adedoyin-Olowe, Gaber & Stahl, 2013;Injadat, Salo & Nassif, 2016) are centered on data mining. Alrashidi et al. (2021) ;Camacho & Alves-Souza (2018); Eirinaki et al. (2018) highlights RS based reviews and Javed et al. (2021) focuses on RS based methods but does not incorporate it on social networks using data mining rules.
There is a need for an SLR that deals with all the data mining and recommendation methodologies used in social networks without any biases. This SLR focuses on DRNS-based articles, thus capturing all the schemes used in this domain. It also extracts some necessary elements used for research such as dataset, evaluation metrics, techniques, and methods, along with considering the research trends by evaluating year-wise, country-wise, and publication venue-wise progress in the domain of DRSN thus, gathering all the details about articles.
Our proposed SLR is different from the reviews stated in Table 2 because we focus on different research questions that highlight on publication venues in which articles are published. Furthermore, we synthesized year-wise chronological distribution and geographical distribution of studies. Moreover, the proposed methodology is evaluated in terms of framework, scheme, method, model, algorithm, or application. Additionally, the domain-based categorization of the proposed solution is investigated. Likewise, an investigation is made on datasets and the size of datasets used by different studies. Similarly, analysis is made on evaluation criteria used for the validation of the proposed solution. Lastly, the identification of research gaps in DRSN is performed by considering its limitations. The next section discusses the research methodology that is being used to perfrom the SLR.

RESEARCH METHODOLOGY
In this SLR, we highlight the procedure introduced by Kitchenham et al. (2009) which is based on principles of software engineering. This complete SLR is formulated and executed on Kitchenhem's protocol. This protocol is mainly divided into three phases (Ajmal & Muzammil, 2019;Muzammil et al., 2021) which are shown in Fig. 1. The steps of SLR are discussed below.

Planning SLR
At this phase, the structure of performing an SLR is planned by considering different characteristics of the review. This starts with the need of performing an SLR including the formation of research questions and identifying electronic databases from where articles are extracted.

Identifying the need of SLR
This SLR is needed because social media is a widely used platform at the current time and it produces a lot of content that needs to be mine for future predictions. No such work has been performed to date which works on the domain of DRSN explicitly and rigorously. The existing articles show that Adedoyin-Olowe, Gaber & Stahl (2013) and Injadat, Salo &   (2019) and Anandhan et al. (2018) stress both data mining rules in RS using social media data, but it is a survey study with limited scope. Therefore, there is a need for an SLR that focuses on DRSN which includes articles from 2011 to 2021 and focuses on research objectives that find outs publication venues, year-wise distribution, country-wise distribution, publisher information, algorithms, schemes, data sets, size of data sets, evaluation metrics and limitations of existing articles focusing on DRSN.

Identifying data sources
The search strategy includes the identification of electronic databases (ED) that are explored to gather the studies on the topic. Table 3 shows the EDs that will be used in SLR. The databases that are used to perform this SLR are most relevant to the domain of DRNS as they are famous scientific EDs and helps to find the best results.

Identifying research questions
Research questions (RQ) are the most important part of SLR. RQ is used to perform research by answering them procedurally and systematically. Table 4 shows the RQ along with the motivation. These RQ will be addressed in the systematic literature review.

Guiding the SLR
The most important component of an SLR is query formation. The query is generated by using different terms used in our topic and the conjunction of the ''AND'' and ''OR''

RQ1
What publication venues working on data mining in recommending systems upon social media websites?
Explain their year-wise chronological distribution along with geographical distribution.
To find out • Publication venues for DRSN • Year-wise distribution of DRSN • Country-wise distribution in DRSN • Analysis based on information like publisher, country, proposed solution type and quality criteria score of selected studies in DRSN

RQ2
What are the different approaches of data mining along with recommendations adopted in social websites?
To find out the algorithms and schemes in which recommend systems are used with data mining.

RQ3
What kind of datasets metrics are adopted for conducting DRSN? Also, which methods are considered for validating the performance of DRSN?
To find out various detail in domain of DRSN

• Dataset
• Size of dataset

• Evaluation metrics RQ4
What are the limitations of using data mining in recommended systems upon social media websites?
To find out the limitations in the field of the social website based recommend systems to identify research gaps. operators. Furthermore, to search data from the selected EDs, a research query is formulated to search data on the ED. The search query is the combination of keywords and their synonyms used in DRSN such as data mining, recommendation system, etc. While formulating search query different groups are made. Each group comprises a keyword and its synonyms which are joined by using ''OR'' in the search query and different groups are joined by using ''AND''. The query components of DRSN are shown in Table 5, which collectively formulates a synthetic query.
The following research query is formulated using Table 5 (Recommend* OR Recommend* System OR Expert System OR Suggest*) AND (Social) AND (Media OR Forum OR Platform OR Website*) AND (Data Mining) which can be applied on different ED specified in Table 4. The structure of queries writing is unique in different ED's is shown in Table 6.

IC7
Articles having no validation their novelty and total relevance can be used

IC8
In case of some enhancement in already published article related to DRSN, the latest article will be considered.

IC9
The article is JCR listed journal article in field of DRSN

Study selection criteria
There is a proper flow of selecting studies while performing SLR. It is necessary to pass all the phases to achieve better, meaningful, and quality studies. There are four steps in the study selection procedure which are illustrated in Fig. 1.
In identification phase the articles are first identified using a search query. Whereas, while screening the articles are scanned by title, abstract, and full text in sequence. The eligibility of selected studies are passed from inclusion and exclusion criteria. Lastly, final inclusion depends upon quality assessment criteria and duplication removal.

Identification and selection of primary studies
The primary studies are identified based on the search query, screened by title, abstract, and full text analysis. The eligibility of selected articles are checked by considering inclusion and exclusion criteria determined in Tables 7 and 8.

EC3
Articles related to search article recommendation.

EC4
The article other than journal article.

EC5
The articles other than the domain of DRSN should not be discussed. Table 9 Quality assessment criteria.

QAC1
The goals of articles along with objectives should be clear. 1

QAC2
The article should explain its results and experiments properly in detail.

QAC3
The limitations should be stated explicitly in the article. 1

QAC4
The article should have checks for novelty along with validation of proposed solution. Studies that fulfill the eligibility phase are scanned for quality assessment criteria to include the final articles. The quality assessment criteria are shown in Table 9.

Extraction of data
For data extraction according to the requirements and RQs. The data is obtained from selected studies. Table 10 helps to excerpt data from articles.

Reporting SLR
In the current era, DRSN is an active field because of the immense use of social media. Our SLR identifies the articles relevant to DRSN from different electronic databases. A total of 200+ articles are extracted in the initial phase from 2011 to 2021. The articles are then filtered out by the study selection procedure shown in Fig. 2. Figure 3 shows the extraction of articles based on the steps shown in Fig. 2. The identification phases comprise all the studies that are extracted from different databases. We have only applied three EDs explained in Table 3. Initially, we had 277 articles in the identification phase. In the screening phase, the articles are skimmed based on title and abstract which results in 95 articles. In the third step, eligibility is ensured by considering inclusion, and exclusion

Publication venue
It includes name of the publication venue or journal.

Publisher
This state the publisher in which the journal is referenced.

Year
It includes the year in which article is published.

Country
This states the country of article's first author

Method/Framework/Technique
This will state the type of proposed methodology suggested by the authors

Domain
It states the community that can get benefit from the study.

Algorithm
It shows the algorithm used by studies as their proposed solution.

Datasets
This includes the dataset that studies used to per from their experiments.

Datasets size
This includes the size of data set that studies used to per from their experiments.

Evaluation metrices
This demonstrates the evaluation methods that are used by studies to perform validation of their proposed system.

Research gaps
It identifies the limitations of the studies that will leads to the research gaps for new studies.

QAC1, QAC2, QAC3, QAC4
This evaluates the articles score on the basis of quality assessment criteria shown in Table 9. criteria and a total of 46 articles are extracted. Finally, studies are selected using quality criteria which are 42 in number. The quality criteria shown in Table 8 are checked for each article. Four quality criteria standards should qualify to be part of SLR. It has total four marks, one for each QAC. Table 11 shows the criteria of scores. The count of eligible articles concerning the score is shown in Table 12.
The articles with a quality score of more than 1.5 out of four are added to this SLR. Figure 4 shows the quality ratio of articles related to each QAC, which shows that all

RESULTS AND DISCUSSION
This section discusses the selected 42 articles based on the proposed research questions.
RQ 1: What publication venues working on data mining in recommended systems upon social media websites? Explain their year-wise chronological distribution along with geographical distribution.
The data mining methodology is essential to mine data to provide recommendations. The data produced by social media websites are considered big data that needs to be stored and mined properly using data mining methods to get suitable recommendations. The 42 selected studies are classified concerning publication venue, publisher, country, and proposed solution. Figure 5 shows the chronological year-wise distribution of selected articles. It depicts those nine articles published in 2018, which is the highest number of articles in any year, and the second-highest is eight articles published in 2020. Figure 6 displays the geographical distribution according to the first author of the journal. It is observed that 17 out of 42 articles are published in China, demonstrating that 40% contribution in the domain of DRSN is by Chinese authors.
Tables A1 in the appendix shows the categorization of all selected articles according to the year they are published. It comprises publication venues, publisher, year of publication, geographical location of the first author, type of proposed methodology, and quality assessment score. The QAC is evaluated based on criteria shown in Table 9. From Table  A1, it is synthesized that 26 articles are published in IEEE, ten are published in Elsevier, three are extracted from Springer, one from ACM, and Science Direct, which is demonstrated in Fig. 7. The IEEE, ACM, and Springer databases are used during study selection process however, others are identified using the snowballing process of finding articles.
Moreover, the research trend of DRSN can be evaluated in Fig. 8, which shows that data mining, RS, and social networks are related to each other. It also shows some of the terms related to the domain. This figure is generated to check the research trend of selected articles based on the most frequently occurring words in their abstracts.

RQ 2: What are the different approaches to data mining or recommendation adopted in the recommendation of social websites?
The proposed solution can be an algorithm, method, framework, model, or application. It is analyzed that 35% solutions are frameworks, 51% are methods, 6% are applications, 6% are models, and 2% are algorithms. Figure 9 expresses the percentage distribution of the proposed solution.
Furthermore, it analyzed that the proposed solution can work on social media generically or use social media data to apply it to specific domains. Social media is considered by 27 studies. On the other hand, five articles uses it for travel purposes (Fang et al., 2014;Wang et al., 2014;Xu, Chen & Chen, 2015;Zhao, Qian & Xie, 2016;Jamiy et al., 2015), two articles use it in health (Zhao et al., 2018;Zhang et al., 2019), two article uses it as a web based proposed solution (Jiang et al., 2016;Alduaiji, Datta & Li, 2018), two articles uses for mobile phone based solutions (Zanda, Eibe & Menasalvas, 2012;Majid et al., 2013), a single article focused on museum (Qin et al., 2018), one article deals with shop type recommendations (Moro, Rita & Vala, 2016), and an article deals with stock data (Zhou et al., 2019). The distribution of domains in DRSN is shown in Fig. 10.
The DRSN mainly focuses on data mining and recommendation techniques. The data mining methodologies includes bayesian network (Min & Cho, 2010), clustering (Zanda, Eibe & Menasalvas, 2012;Yang, Qu & Cudré-Mauroux, 2018;Qin et al., 2018;Milovanović et al., 2019), k-NN (Majid et al., 2013;Adeniyi, Wei & Yongquan, 2016;Nguyen & Cho, 2020;Xu, 2018), topic season matrix mining (Jiang et al., 2016); support vector machines, Moro, Rita & Vala (2016) et al., 2019). Whereas RS is divided into three main state-of-the-art categories such as CBF, CF, and hybrid mode. The CBF comprises of some algorithms named term frequency-inverse domain frequency (Majid et al., 2013;Jiang et al., 2016;Wen et al., 2017;Psyllidis, Yang & Bozzon, 2018) topic modeling (Wang et al., 2014;Lwowski, Rad & Choo, 2018), latent Dirichlet allocation (Wang et al., 2014;Fang et al., 2014;Pyo & Kim, 2014;Jiang et al., 2015;Xu, Chen & Chen, 2015;Shi et al., 2017;Sang, Yan & Xu, 2018;Cui et al., 2018;Psyllidis, Yang & Bozzon, 2018;Nguyen & Cho, 2020;Ge et al., 2020) feature extraction (Yu et al., 2016), word2vec (Zhao et al., 2018, and natural language processing (Psyllidis, Yang & Bozzon, 2018). On the other hand, CF uses user-item matrix (Pyo & Kim, 2014;Jiang et al., 2015;Moro, Rita & Vala, 2016;Iqbal et al., 2019;Manca, Boratto & Carta, 2018;Zhang et al., 2019;Ju, Wang & Xu, 2019;Margaris, Vassilakis & Spiliotopoulos, 2020;Shahbaznezhad, Dolan & Rashidirad, 2021), friend-matching graph (Wang et al., 2014), social network analysis (Wu et al., 2015;Wu et al., 2019), matrix factorization (Zhao, Qian & Xie, 2016;Yu et al., 2016;Zhao et al., 2018;Xu, 2018), classification (Yang & Jiang, 2018), and graph theory (Alduaiji, Datta & Li, 2018;Ahmadian et al., 2020). Therefore, in DRSN, it is observed that data mining and recommendation methods are used interchangeably to provide better results. Moreover, the hybrid mode is also being used by different studies in terms of algorithm concatenation. Table 13 shows the selected studies' methods, schemes, or algorithms. It is observed that most of the studies use CF for providing recommendations to users and cluster data for mining based on using the same method. Around nine out of 42 studies use CF in the domain of DRSN. Other schemes are not used this frequently.  Table 14. It is evaluated that Twitter, Facebook, Flicker, and Foursquare are used by eight, six, six, and four studies, respectively. Three or fewer studies use the remaining datasets. Through all of available social networks, Twitter user population shows the consistent growth trends it provides to a platform for people to connect, communicate and exchange their experiences with each other. However, to perform a through analysis on social media twitter is widely adopted by many researchers. One phenomena behind this can be the reason as twitter provides a broad range of users' activity data points and an easy access to this data. It is observed that five of 42 articles uses the real-time data set (Min & Cho, 2010;Wang et al., 2014;Wu et al., 2015;Xu, 2018;Ju, Wang & Xu, 2019). All the other articles used offline mode datasets. Table 14 also discusses the size of the dataset used by different studies. Several methodologies have been adopted to analyze and validate the research in various studies using evaluation parameters. It is part of experimentation, used to justify the research work using qualitative and quantitative experimentation. It is evaluated that one article uses a qualitative way of evaluating results (Wu et al., 2015) by using feedback from the user. At the same time, all other studies use the quantitative method using some evaluation metrics to evaluate the validity of studies. The evaluation metrics used in the domain of DRSN are shown in Table 14. A total of 12 studies use precision and recall for evaluation independently. As precision and recall provides true insights about the recommendations, which cannot be provided by accuracy. The true positives (TP), true negatives (TN), false positive (FP) and false negative (FN) are very crucial for providing recommendations, specially in health care. To detect TP, TN, FP and FN in efficient way, precision and recall are well suited. For this reason, it is used by most of the studies. Whereas the F1 score is used by eight articles, the root mean square error (RMSE) and mean absolute error (MAE) are used by five studies. Lastly, mean average precision (MAP) is used in four studies. Less than four articles use the remaining evaluation metrics.
The datasets and evaluation metrics are an essential part of successful research. The top research trends in the domain of DRSN are shown in Figs. 11 and 12. The complete detail about it is explained deliberately in Table 14. The research process provides novelty and adds contributions to the world. On the other hand, there are certain limitations of each research work. Some researchers explain them explicitly in their work, while others do not. Those who do not mention them, the reader finds it by critically reading the article. The limitations of selected studies are extracted and stated in Table 15. Results showed that even though RS has seen a lot of attention lately, there are still several challenges and possibilities that will influence RS's future for researchers. Additionally, a variety of strategies that include various aspects can be used to create social recommendation systems that achieve high accuracy. As a result, these limitations lead to research gaps on which future work can be performed.
From the limitations explained in RQ4, it is evaluated that there is a need for a hybrid model. This model will use both contents and statistics to deal with the data. Besides the

Yang, Qu & Cudré-Mauroux (2018)
Cluster-wise obfuscation function learning+Probabilistic historical data obfuscation, Personalized activity-wise obfuscation function learning, Probabilistic online activity obfuscation Qin et al. (2018) Clustering + Singular Value Decomposition Cui et al. (2018) Latent Friends mining + K-means clustering algorithm + all reviews from a user and all tags from their corresponding items algorithm, weighted local random walk with restart

Shahbaznezhad, Dolan & Rashidirad (2021)
Collaborative filtering hybridization of CBF with CF, other methods can be combined with these algorithms. Furthermore, studies have not shown proper methodology to mine or store data that can be further used to provide accurate recommendations. The correct methodology to store data is required for efficient recommendations. Moreover, proper evaluation methods need to be applied and explained deliberately because the studies use the most commonly used evaluation metrics and require proper measurements to validate the work.

DISCUSSION AND LIMITATION
The previous section discusses the findings and results gathered from research questions. The articles are selected based on particular EDs that are related to DRSN and selected articles are from venues that are JCR listed which covers the Scopus database as well. This section deals with the summarization and discussion of the findings gathered from research questions and formulate a taxonomy that provides an overview of the results.

Taxonomy order
The main objective of this SLR is to observe the latest trends in DRSN by considering 42 articles. To accomplish this purpose, hierarchical taxonomy of selected articles is   formulated and shown in Fig. 13. Several codes are assigned to different methodologies while performing SLR. To to develop a hierarchy. The codes are shown in Table 16. It shows the broader view of SLR. It has inspected advances and challenges in the domain of DRSN in terms of recommendation approaches, recommendation domains, data used in RS in social media, data mining methodologies, and performance metrics. However, its sub-levels are also shown to get a better insight into the DRSN and its types.

Limitation of SLR
The limitation of this SLR includes the unavailability of renowned EDs that are Scopus and Web of Sciences in our region. However, more EDs can be used to extract data.

GENERAL OBSERVATION AND FUTURE WORK
The SLR focuses on four research questions that are discussed in this section along with some future directions. The RQ1 analyzed that most of the studies are published in IEEE, which implies that IEEE focuses more on recommendation systems and the data mining domain. Furthermore, it was found that China is contributing more to DRNS as the Chinese are progressing to understand current domains better. Furthermore, it has been analyzed that most of the articles were published in 2018 and then in 2020, which shows that social media greatly impacts society. It is very important to utilize this data, thus providing better recommendations to the users. The recommendations provided to the users are based on their search patterns and way of using social media. Different users have different recommendations because each user has it is own modal, patterns, and likings. Therefore, social media plays a significant role in providing recommendations according to the user's interest.
The RQ2 presents five types of solutions used by studies, namely, framework, method, applications, models, and algorithms. It is analyzed that the most common type of contribution is in the form of methods and then in the form of framework. Furthermore, it was found that most of the studies use the CF algorithm for providing recommendations. CF can be used for data mining and recommendation purposes because it formulates a graph-like structure at the back end. Furthermore, other schemes use CBF and data mining. Some studies use a hybrid algorithmic model by using different methodologies. This hybridization helps in the flexibility of the solution. There are limitations in an algorithm that can be minimized with the help of another process.
The RQ3 presents datasets used by primary studies evaluated in the domain of DRSN and evaluation metrics. Both are an integral part of the research process. Data is the most important thing in social media because all the recommendations are based on that. The dataset can be either online or in offline mode. Most studies use offline mode dataset because it is easy to handle. Moreover, it is evaluated that studies most frequently use the Twitter dataset because Twitter is a big social media platform known for accurate and reliable data. Furthermore, Facebook and flicker are social media platforms dependent on recommendations. Therefore, Twitter, Facebook, and flicker are widely used datasets by  A high amount of data helps in better results but consumes a lot of processing time.
Whereas small data set takes less processing time but it may or may not compromise on quality of results After that, some computations or algorithms are applied to formulate a proposed solution. Therefore, after completing the solution, it is needed to be evaluated. They can be evaluated either qualitatively or quantitatively. Most of the articles focus on quantitative evaluation, in which precision and recall are the most used evaluation metrics. The RQ4 focuses on the limitations of studies evaluated in the domain of DRSN. It is noted that most studies use hybridization of algorithms to provide recommendations, but they do not hybridize the methodologies or schemes such as CBF with CF. The method hybridizations help to eliminate the limitations of each other. Moreover, more evaluation metrics need to be implemented to validate the results. Furthermore, the limitations in existing studies lead to research gaps. The analysis of studies highlights the gap in existing

ID
Research question

RQ1
How data mining is dealt with the changing the interest of social data to provide better recommendations?

RQ2
What are the criteria's of managing concepts drift in social media and how mined data will you contribute to getting recommendations accordingly?

RQ3
How RS supports security in social media using IoT devices?

RQ4
How to ensure the accuracy in recommendations due to a huge number of social media-related data?

RQ5
How data mining deals in avoiding universal recommendations in social media as data is in bulk quantity?
work that deals with users' changing interests or new interests along with concept drifts. Furthermore, security-based methodologies are required to deal with social media, ensuring the accuracy of the recommendations. Lastly, the methodology deals with a universal type of recommendations. Different findings are extracted from this SLR. Four RQs were formulated to examine the methods, approaches, tools, data sets, and performance metrics for recommendation using data mining rules on social media facts and specify an exclusive indication of subject matter. A lot of mixed results can be noted to deal with the issues in DRSN systems. Table 17 shows the RQs that will help formulate a new primary study in DRSN. The details of future directions are given below.

Changing trends in recommend systems
In social media, there are chances of change in user trends depending upon the current situation. This situation will lead to a cold start problem (Lam et al., 2008) and grey sheep problem (Gras, Brun & Boyer, 2016) in which the system will have no prior information about the new trend. The recommended systems require data for providing recommendations. Moreover, the lack of information about new trends shows the limitation, as the RS provides suggestions based on data. The cold start problem can never be eliminated from scratch because it results in data sparsity. The data sparsity makes it difficult during feature extraction and similarity management (Natarajan et al., 2020). In the future, data mining can be used to deal with data based on semantics so that similar data can be incorporated concerning new trends so that the cold-start and grey sheep problem can be minimized.

Concept drifts in recommend systems
The state-of-the-art recommendation models are not versatile enough to deal with the dynamism problems. Few of the studies overlook the significance of a user's historical data or experiences during recommendations that affect the results. These results in concept drifts which arise in several ways and at various times, and it should be dealt with accordingly (Klinkenberg & Renz, 1998). For this purpose, certain algorithms should be explored to get the correct research path. Furthermore, the evaluation of concept drift requires real-time due to dynamic user interaction with social media. Therefore, future research requires different assessment methods from state-of-the-art methods, such as precision, recall, and diversity.

Ensuring security in social media using recommend systems
In social media, security is a major concern to deal with as the data is available online and can be used at any time by any user if proper privacy policies are not applied to the system (Jamiy et al., 2015). Social media is growing at a great pace. Therefore, the system faces certain challenges which are difficult to handle. Criminals that want to breach the security barrier with innovative methods. Traditional methods are unable to identify complex and zero-day attacks. New reliable solutions are required to deal with these issues. Therefore, AI models are used for these purpose and manage the time complexity factor (Shaukat et al., 2020b). Therefore, a real-world and logical solution is required to target social network security assurance. RS should develop a mechanism that verifies the data in the Internet of Things (IoT) devices as data is the main concern of DRSN. The current world is into IoT and moving towards advanced IoT, so the security should not be compromised to maintain the privacy of each individual.

Accuracy in recommend systems
In RS accuracy of results should be precise to provide correct recommendations. However, there are some loops holes such as some systems that provide high precision on the public data set, but in the case of private data set, the system's accuracy and privacy of data face clashes. In the case of high private data, there is low accuracy and vice versa, which cannot be improved further (Xu et al., 2020). So, there requires a system that deals with the dynamic interactions of the user. This common recommendation is due to the insufficient response of the user (Xu et al., 2020). Therefore, the system requires optimization and proper interaction of humans with computers to get better results with relevance to accuracy on private data.

Avoiding universal recommendations
The current era is a time of data explosion as social media produces a large amount of data. The RS was developed to deal with big data. The RS's have different applications which are evolving over time and thus, resulting in an excess amount of information. However, the system's performance depends upon the data it is using. Nevertheless, different data sets in recommend systems show different results depending on the domain. Dissimilar data sets are often not compatible with each other, thus resulting in poor recommendations. So, in the future, to increase the usefulness of the system, certain data mining methods using graph theory based on semantics need to be implemented, which eliminates the issue of providing universal recommendations. It formulates graphs so that all the data is linked through semantics and provides better results according to requirements.

CONCLUSION
This article performs an SLR in the domain of DRSN. All the abbreviations used in this SLR are shown in Tables A2. A total of 42 articles were investigated and explored. Their publication venues are extracted along with publisher details. A total of 26 articles out of 42 are published in IEEE. Furthermore, chronological distribution as evaluated, stating that most articles were published in 2018 with nine research articles. The proposed solutions in the domain of DRSN are mostly models and frameworks. Moreover, according to the first author of the article, geographical distribution was evaluated, which states that China has the highest ratio of work in this domain. Also, the widely used method by the articles is CF which is used by nine studies. Additionally, most articles contribute to social media, but some work in other domains, such as travel health. The most commonly used data set is Twitter, used by eight studies, and the top trend evaluation metrics in the domain of DRSN are precision and recall, which are used by 12 studies each. Lastly, the limitations show a need for a hybrid model that concatenated both the CBF and CF methods and with or without other methods for providing recommendations. In the future, more research questions can be added to deal with the domain of DRSN to expand the research.

APPENDIX
This sections shows the detail of selected articles used in SLR including publication venues, publisher of journals, year of publication, country of first author, type of proposed methodology and their quality score that is calculated on the basis of QAC mentioned in Table 9. The Tables A1 shows the summary.
The Tables A2 comprises of list of abbreviations, acronyms and various notations.

ADDITIONAL INFORMATION AND DECLARATIONS Funding
The authors received no funding for this work.

Data Deposition
The following information was supplied regarding data availability: There is no raw data for this literature review.