Recommendation and Classification Systems: A Systematic Mapping Study

Today


Introduction
e great growth in the amount of data and information that can be accessed (the known Big Data), coupled with government collaboration to provide open information (Open Data), makes companies very interested in this issue. One of the biggest problems in this area is that this information is not found in one single place, not even in a common interpretation format. erefore, it is necessary to create solutions that collect these dispersed data and apply a specific treatment so that they can be offered to their customers. e collection of dispersed information and its unification in order to be able to work with it would open a new market niche, a new business unit, considering the possibility of generating valuable data automatically. In addition, it would increase independence when making decisions or solving problems without having to resort to an expert in business management.
e ADAGIO project was born in this context. It is a R&D project that combines Big Data and machine learning (ML) strategies for the treatment of geolocated data extracted from heterogeneous data sources. It enables the aggregation, consolidation, and normalization of data from different semantic fields obtained from the sources mentioned before. Its purpose is to allow reconciled information to be consulted using specific variables, thus facilitating the generation of knowledge. e application of classification and recommendation systems in this project is of great interest for the interrelation and periodic consolidation of the data process so that the system develops capabilities for transformation, interrelation, and integration of data through supervised learning. In addition, these systems provide a great value for the management of queries, to enhance the performance of queries by users in a language as natural and high level as possible. e fact that the user obtains good results during the searches in the ADAGIO platform is one of the main objectives of the project. In order to improve the user's experience, suggestions are proposed during the phase of filling the search parameters. For this phase, the collaboration of the system users will also be required, evaluating the results of the searches according to their quality and precision.
is study has been performed to facilitate researchers and practitioners the task of choosing the most appropriate system, technology, or algorithm to include in the ADAGIO project for satisfying their requirements. In this sense, this paper presents a systematic mapping study (SMS) that analyzes the current state of the art of the recommendation and classification systems and how they work together. en, from the point of view of the software development life cycle, this review also shows that the work being done in the ML (for classification and recommendation) research and industrial environment is far from earlier stages such as business requirements and analysis. is makes it very difficult to find efficient and effective solutions that support real business needs from an early stage. en, this paper suggests the development of new ML research lines to facilitate its application in the different domains.
is paper is organized as follows. Section 2 describes the closest related work to our proposal; Section 3 details the selected method to carry out the SMS; Sections 4 to 8 illustrate the execution of the different phases of the SMS; and finally, Section 9 summarizes the conclusions obtained from the study and presents a set of future work.

Related Work
Recommendation and classification systems are acquiring much interest within the scientific community. In this section, the closest related works to the research proposed in this article are presented.
Jaysri et al. [1] presented a complete review of the recommendation systems, focusing on the collaborative filtering. It shows different algorithms based on this filtering for both the user profile and the product characteristics. In addition, it demonstrates several classification methods that may be part of the input for recommendation systems. Ekstrand et al. [2] presented a general overview and focused on the field of recommendation systems.
eir purpose was to learn more about the current development of recommendation methods, specially systems making use of the collaborative filtering.
Obtaining a research perspective on how to make decisions when choosing algorithms to propose recommendations can be found in the paper presented by Gunawardana and Shani [3]. It criticizes the use of online methods, which can offer measures to choose recommendation algorithms, and determines as a crucial element the use of offline tools to obtain these measures. In addition, it discards the use of traditional metrics to make the algorithm choice and reviews the proper elaboration of experiments to carry it out. To do this, the authors perform an analysis of important tasks of the recommendation systems and classify a set of appropriate and well-known assessment measures for each task.
Poussevin et al. [4] exposed the challenge of considering the preferences of users when recommending. e authors analyzed a combination of recommendation systems and classifiers that highlight words that indicate a gap between users' expectations and their actual experience. ey conclude that traditional recommendation systems analyze the past classifications; that is, they consider the users' preferences history, while the recommendation systems that analyze the opinion classifications consider the existing evaluations at that moment.
Within the scope of ML, there has been an increase in the interest of the research community, being the subject of many papers. Some of the proposals use lexical classifiers to detect possible feelings using content-based recommendations [5]. Other authors have focused on more traditional branches of ML, using well-known and proven statistical methods such as logistic regression, the Pearson correlation coefficient, or the application of the naive Bayes theorem based on probability, among others [6]. e authors of this paper focused on making extensions of these methods to solve problems inherent in recommendation systems such as cold start or scalability. e cold start [7] is a typical problem since the beginning of the recommendation systems because when a system does not have enough data, precision cannot be assured when recommending. is is a problem that gets worse at the beginning of the implementation of a system when data are not available. Scalability becomes a quite difficult task due to the increase of information in recent years and the amount of data that systems must manage. Recommendation systems, both product and user-based, affect performance and accuracy when these amounts of data are very large. e work presented by Ghazanfar and Prügel-Bennett [8] has been also focused on this problem, generally for the user-based recommendation, which is the most used.
Alternative interesting related work focused in the use of ML is the survey in sentiment classification presented by Hailong et al. [9]. In this work, the authors also provide a comparative study of the techniques found, concluding that supervised ML present a higher accuracy, while lexiconbased methods are likewise competitive because they require less effort and they are not sensitive to the quantity and quality of the training dataset. e survey presented by Mu [10] delivers a review of deep learning-based recommender systems. e authors conclude this work summarizing a set of future research lines such as cross domain, scalability, explainability, or deep composite model-based recommender systems, among others. e paper presented by Portugal et al. [11] presents a systematic review of the use of ML in recommender systems. e authors analyzed 121 primary studies classified in different categories: content-based and neighbor-based of content-based filtering, neighborhood-based and modelbased of collaborative filtering, and hybrid filtering. is work helps developers to recognize the algorithms, their types, and trends in the use of specific algorithms. It also offers current-type evaluation metrics and categorizes the algorithms based on these metrics. Ouhbi et al. [12] proposed a deep learning-based recommender system to overcome some limitations of existing approaches. In the related work section of this paper, the authors describe a small state of the art of deep learning-based recommender systems, detailing the method, approach, metric, dataset, advantages, and disadvantages of seven proposals.
Zhang et al. [13] delivered a wide review of deep learning-based recommender systems, proposing a classification and highlighting a group of the most influential. e authors debate the pros and cons of using deep learning techniques for recommendation tasks. Additionally, some of the most pressing open problems and promising future extensions are detailed.
In summary, the literature review presented different topics, which may come close to the objective pursued. But there are several differences between these papers and the one presented in this work: (i) the review process: unlike the rest of the papers, this research presents a systematic and rigorous process, ensuring the quality of the results obtained; (ii) the context of application: usually reviews are carried out on the scientific literature; in this case, this research also presents a review on the industrial scope, analyzing the main existing solutions to the problem; and (iii) the scope of application: in this systematic review, the state of the art of the classification and recommendation systems is presented working together, something that in the related works already mentioned is not carried out or it is done independently for classification or recommendation.

Methodology
A systematic literature review is an effective way of knowing the state of the art of a subject. is procedure ensures a certain level of quality of information and has the support of the research community. e monitoring of a systematic and guided process guarantees reliable and interesting results and facilitates the work of gathering information. e review presented in this paper is placed within the context of the recommendation and classification systems from two perspectives: scientific and industrial.
When carrying out a systematic literature review (SLR), the main methodology to be considered is the one presented by Kitchenham and Charters [14]. is is one of the most widely accepted methods in the area of software engineering. It offers a way of performing a SLR consisting in three phases: planning and conducting the review and reporting of results. However, instead of performing a deep review of the papers comparing them, which is the main goal of a SLR, this study seeks to provide an overview of an interesting topic and to identify the number and type of published-related researches, as well as the related results available. erefore, the best methodology to be applied is the systematic mapping study (SMS) presented by Petersen et al. [15], a type of the systematic review but with a broader objective. is method will allow identifying the subjects that lack empirical evidence and which are necessary to carry out more empirical studies. SMSs show many similarities with respect to the SLRs. As possible to see in activity diagram of Figure 1, this method stablishes a set of five steps, where each of them produces an output. ese steps are as follows: is activity will let the researchers to classify which is the state of the art of the topic and to identify gaps and possibilities for future research.

Definition of Research Questions
A Research Question (RQ) is the fundamental core of a research project, study, or literature review. erefore, to know and better understand the existing literature related to the recommendation and classifications systems, it is necessary to formulate a set of research questions. ese questions will focus the study, will determine the methodology that will be established, and will guide all the stages of this research. In this sense, the RQs that have been proposed for this SMS are as follows: (i) RQ1. Which recommendation and classification systems have been researched? (ii) RQ2. Which recommendation and classification systems have been used? (iii) RQ3. Which is the nature of the systems found? (iv) RQ4. Which are the objectives pursued in the proposals found?

Conduct Search
Before performing the search in the different digital libraries, it is necessary to complete two operations: define the digital Scientific Programming libraries where the searches will be executed and establish the keywords that will compose the search strings. Selected digital libraries to carry out the search have been the following: SCOPUS, IEEE Xplore, ACM, and ScienceDirect. In addition, for the industrial scope, the search engines that have been selected are Google, Yahoo, and Bing.
To specify the search, keywords were defined, and it is a fundamental part when creating the queries for each digital library. ese keywords were obtained after carrying out an analysis of the field of study to which this research applies, recommendation and classification systems. Table 1 shows the complete set of keywords used, and equation (1) shows the formula applied to these keywords to create the final queries.
Boolean expression of keywords is as follows: Once all the keywords were defined, the queries were constructed. ese queries were different for each digital library, and they had different boundary characteristics, depending on the possibilities of the digital library. Digital libraries have certain limitations when conducting searches. For example, some of them do not allow the use of complete search strings; in others, it is necessary to complement these strings with simple textual searches. For this reason, there is the need to create individual queries for each library and, subsequently, to treat the search results to obtain the same results that could have been obtained using the originally proposed query. Table 2 shows a set of examples for each of digital library. e search was executed on the title, abstract, and keywords of the papers, except in those digital libraries that did not allow it. In such cases, the search was performed on the complete text. Search strings, metadata of found elements (title, author, and year of publication), and summaries of the documents were stored for each search source. Once the first search was executed, it obtained an initial set of 1,195 potential primary studies.

Screening of Papers
ere are different metrics to define the quality criteria that make a paper relevant. In this work, in addition to those related to the structure of the papers, the quality assurance criteria defined by those scientific papers found that were classified in the following accepted indexes: (i) "Journal Citation Report (JCR)" [16]
Multiple appearances must be eliminated (v) C5, Criterion 5. As mentioned above, papers must be classified into the JCR or SCIE rakings (vi) C6, Criterion 6. e reading of the abstract must fit with the dealt topic Finally, some recommendations from experts in the subject dealt with in this SMS have also been considered. If these studies were not found after the execution of the different searches, they were included in the final selection of primary studies.
Once defined the quality and inclusion and exclusion criteria, the screening of the papers was performed. According to the C1 of inclusion/exclusion of papers which scope is related to "Computer science," a total of 923 results were obtained, having discarded 272 papers that did not meet this criterion. C2 was applied to the 923 papers obtained from C1 resulting on 909 papers. To the results obtained from C2, C3 criterion was applied leaving a total of 432 results. Once C4 was applied, a total of 96 papers were removed remaining 336. A total of 259 papers was the result of applying C5. e last filter, C6, was applied resulting on 99 papers considering that 160 of the removed ones did not fit the topic of this research. Finally, repeated papers were removed. is process ended up removing duplicated entries between the different digital libraries.
e result of applying all the quality and inclusion and exclusion criteria was a total of 80 primary studies which will be categorized into the classification schema. e number of papers found corresponds (roughly) to 6% of the results found in the first search. Table 3 shows the primary studies selected. Figure 2 shows the list of keywords discovered in the different primary studies. In this figure, the keywords are classified based on the total number of matches found between all these primary studies. Figure 3 depicts the complete process of selecting primary studies. It shows the search procedure for each digital library and the results after the application of each quality and inclusion and exclusion criteria.
By the same token, the process carried out previously was executed for the industrial scope for detecting and selecting Table 3: Selected primary studies.

Title
Reference Building accurate and practical recommender system algorithms using machine learning classifier and collaborative filtering [20] DGA botnet detection using collaborative filtering and density-based clustering [21] A multistage collaborative filtering method for fall detection [22] Analysis and performance of collaborative filtering and classification algorithms [1] Extracting a vocabulary of surprise by collaborative filtering mixture and analysis of feelings [4] Content based filtering in online social network using inference algorithm [23] Building switching hybrid recommender system using machine learning classifiers and collaborative filtering [8] Imputation-boosted collaborative filtering using machine learning classifiers [24] CRISP-an interruption management algorithm based on collaborative filtering [25] A credit scoring model based on collaborative filtering [26] Collaborative filtering recommender systems [2] An improved switching hybrid recommender system using naive Bayes classifier and collaborative filtering [6] Tweet modeling with LSTM recurrent neural networks for hashtag recommendation [27] A two-stage cross-domain recommendation for cold start problem in cyber-physical systems [28] ELM based imputation-boosted proactive recommender systems [29] Twitter-user recommender system using tweets: a content-based approach [30] A personalized time-bound activity recommendation system [31] Automated content based short text classification for filtering undesired posts on Facebook [32] Shilling attack detection in collaborative recommender systems using a meta learning strategy [33] Building a distributed generic recommender using scalable data mining library [34] Context-aware movie recommendation based on signal processing and machine learning [35] Recommender systems using linear classifiers [36] A survey of accuracy evaluation metrics of recommendation tasks [3] Incorporating user control into recommender systems based on naive Bayesian classification [37] Classification features for attack detection in collaborative recommender systems [38] Automatic tag recommendation algorithms for social recommender systems [39] Optimizing similar item recommendations in a semistructured marketplace to maximize conversion [40] Capturing knowledge of user preferences: ontologies in recommender systems [41] Emotion-based music recommendation using supervised learning [42] AWESOME-a data warehouse-based system for adaptive website recommendations [43] Scientific Programming 5 the primary technologies or tools that companies offer. e search engines returned multiple results (Table 4), with a total of 21 proposals remaining were potential candidates. Title Reference Lexical and syntactic features selection for an adaptive reading recommendation system based on text complexity [5] A smart-device news recommendation technology based on the user click behavior [44] Recommendation as link prediction in bipartite graphs: A graph kernel-based machine learning approach [45] A novel approach towards context based recommendations using support vector machine methodology [46] A smartphone-based activity-aware system for music streaming recommendation [47] An app usage recommender system: improving prediction accuracy for both warm and cold start users [48] Proposing design recommendations for an intelligent recommender system logging stress [49] A recommender system based on implicit feedback for selective dissemination of eBooks [50] A novel recommender system based on FFT with machine learning for predicting and identifying heart diseases [51] Shilling attack detection in collaborative recommender systems using a meta learning strategy [33] An approach to content based recommender systems using decision list based classification with k-DNF rule set [52] Probabilistic approach for QoS-aware recommender system for trustworthy web service selection [53] Approach to cold-start problem in recommender systems in the context of web-based education [54] Context and intention-awareness in POIs recommender systems [55] A collaborative filtering-based re-ranking strategy for search in digital libraries [56] Learning users' interests by quality classification in market-based recommender systems [57] Mobile content recommendation system for revisiting user using content-based filtering and clientside user profile [58] A hybrid collaborative filtering algorithm based on KNN and gradient boosting [59] A scalable collaborative filtering algorithm based on localized preference [60] Recommended or not recommended? Review classification through opinion extraction [61] Meta-feature based data mining service selection and recommendation using machine learning models [62] Personalized channel recommendation deep learning from a switch sequence [63] Affective labeling in a content-based recommender system for images [64] A novel approach towards context sensitive recommendations based on machine learning methodology [65] A distance-based approach for action recommendation [66] Ranking and classifying attractiveness of photos in folksonomies [67]  Title Reference Consequences of variability in classifier performance estimates [68] Machine learning and lexicon based methods for sentiment classification: a survey [9] Machine learning algorithm selection for forecasting behavior of global institutional investors [69] Towards rapid interactive machine learning: evaluating tradeoffs of classification without representation [70] Towards a method for automatically evolving Bayesian network classifiers [71] A machine learning based trust evaluation framework for online social networks [72] Automated problem identification: regression vs. classification via evolutionary deep networks [73] Empirical evaluation of ranking prediction methods for gene expression data classification [74] Inferring contextual preferences using deep autoencoding [75] Automatic recognition of text difficulty from consumers health information [76] A hybrid approach for automatic model recommendation [77] Learning instance greedily cloning naive Bayes for ranking [78] Pairwise-ranking based collaborative recurrent neural networks for clinical event prediction [79] Accurate multi-criteria decision making methodology for recommending machine learning algorithm [80] A general extensible learning approach for multidisease recommendations in a telehealth environment [81] An efficient recommendation generation using relevant jaccard similarity [82] An image-based segmentation recommender using crowdsourcing and transfer learning for skin lesion extraction [83] Automatic classification of high resolution land cover using a new data weighting procedure: the combination of k-means clustering algorithm and central tendency measures (KMC-CTM) [84] Building a hospital referral expert system with a prediction and optimization-based decision support system algorithm [85] Classification techniques on computerized systems to predict and/or to detect apnea: a systematic review [86] Identification of category associations using a multilabel classifier [87] Making use of associative classifiers in order to alleviate typical drawbacks in recommender systems [88] S3Mining: a model-driven engineering approach for supporting novice data miners in selecting suitable classifiers [89] e use of machine learning algorithms in recommender systems: a systematic review [11] 6 Scientific Programming

Keywording using Abstracts
To create the classification scheme for categorizing the selected primary studies, an attempt was made to answer each of the research questions formulated in the planning phase and, in addition, to identify each of them with a set of features. Moreover, two complete iterations were carried out to classify all the studies and to verify that all the features that had been found included the content of each study. Table 5 shows and describes the classification scheme defined. ereupon, process for the definition of the classification scheme is repeated for the industrial area.
rough the answer to the research questions and the extraction of the technologies' features, a classification scheme was defined ( Table 6).

Scientific Report.
is section describes the most important aspects obtained from the information collected. To achieve this purpose, each of the research questions will be answered and validated, showing the data obtained for each of them. It is important to note that some of the features may appear in several studies; therefore, the totals may not always correspond to 100%.
(i) Research Question RQ1 finds the methods, techniques, and/or tools that have been investigated for the classification and recommendation systems. Figure 4 shows that the predominant type of studies is methods, which represent 35.00% of the total of the studies, followed by the complete system studies, with a 23.75%. e rest of studies correspond to algorithms with 20.00%, analysis with a presence of 18.75%, and finally, frameworks with a 6.25% of the total primary studies. From a software development life-cycle perspective (and avoiding methodological discussions), requirements and analysis phases differ from the design phase because it is an earlier stage and closer to the business (or the application model) and is completely technology independent. en, the found works are contextualized in the technological design phase. No contextualized work was found in early stages (business requirements or analysis). (ii) Research Question RQ2 seeks to know the validation of the studies found, which may be practical or theoretical, identifying if they are within the scientific or industrial scope. e results obtained ( Figure 5) show that all the primary studies were academic focused. Most of them were validated by some way (97.50%), while 10.00% were not validated. It is important to note that three different groups have been distinguished within the validation category. e experimentation subgroup includes all those studies whose proposal was tested and validated by experimentation with synthetic and real data sources. is group contains most of the results found that were validated, 72.50% of the total. Another important category is the one that validates the proposals by a case study, which represent 13.75%. Only the 5.00% of the primary studies were carried out through surveys, and just one primary study was focused on the industrial context, representing the 1.25% of the total.   Furthermore, the classification group is described, where both supervised and unsupervised learning features are presented. Two features stand out for their use: naive Bayes to classify according to probabilities with a 28.75% and support vectors, representing the 20.00% of total. Target based and Random Forest are the less used, with a presence of just 1 primary study.
(iv) Research Question RQ4 indicates which are the main points of interest of the research and which areas have been less investigated. is interest is classified into four categories: novelty, analysis, research, and improvement ( Figure 7). e novelty contains those primary studies whose goal is to present something that lacked in the literature, and this category represents 22.50%, with 18 primary studies. Analysis category contains those results that are comparison or study of different existing techniques, and it represents the 7.50% of total.
e improvement category represents that 30.00% of the results whose main objective is to improve an existing approach. Finally, the largest category is the research one, were a search on existing or new approaches in the literature is dealt with. It represents the 36.25% of total with 29 primary studies.
At last, it is interesting to analyze other results that are not related to the research questions but with the objective of this document. ese results can help to know the evolution of the research of the classification and recommendation systems.     [110] Scientific Programming 9 is feature defines whether the solution proposed by the primary study is based or composed of a recommendation system with a content-based filter Collaborative is feature defines whether the solution proposed by the primary study is based or composed of a recommendation system with a collaborative filter Hybrid is feature defines whether the solution proposed by the primary study is based or composed of a collaborative and content-based filter conjunction Graph kernel is feature defines whether the primary study is based or composed of a graphic classifier Naive Bayes is feature defines whether the primary study is based or composed of a naive probabilistic classifier naive Bayes Logistic regression is feature defines whether the primary study is based or composed of a classifier by logistic regression Decision tree is feature defines whether the primary study is based or composed of a classifier by decision trees  10 Scientific Programming (i) Figure 8 shows the trend of publication in topics related to the classification and recommendation systems. e chart shows that the trend increases in recent years, so it can be deduced that it is a subject of high interest to the scientific community. It is important to note that, at the beginning of 2019, there are already more than half of the papers selected for the previous year. (ii) Figure 9 presents the number of papers obtained for each of the digital libraries and the relationship with those finally selected for further study. In light green, the initial results are shown, highlighting ACM with 27 papers shown, followed by SCOPUS and IEEE Xplore with 23 and 14, respectively. ScienceDirect returned only 4 results. Dark green shows the finally selected studies of each digital library.

Industrial Report.
After the description of the results obtained from the scientific report, this section presents the report of the data bring about conducting the study of the industrial scope.
(i) Research Question RQ1 finds the products that have been developed for the classification and recommendation systems. Figure 10 shows that the most frequent results have been complete systems and    Scientific Programming libraries or frameworks, with 5 and 4 proposals, respectively. e next two features are the APIs and tools, representing 3 and 4 proposals, respectively. In the last place, it located the platform feature, with just one proposal found. e sum of the complete systems and the libraries represent 47.62% of the total of the proposals. e set of technologies that represent the APIs is 14.29%, the tools 9.52%, and finally, the platform is 4.76% of the total. From a software development life-cycle perspective (and avoiding methodological discussion), requirements and analysis phases differ from the design phase because it is an earlier stages and closer to the business (or application model) and is completely technology independent. en, the found works are contextualized in the technological design phase. No contextualized work was found in early stages (business requirements or analysis).
Research Question RQ2 aims to determine if the products obtained in this scope are free or proprietary software.
is classification has great interest to know those that can suppose an extra cost for the execution of the project. According to the taxonomy defined, Figure 11 shows that results are balanced to the open side; commercial software, with 8 proposals, represent 38.10% of the total, and the set of free software technologies is composed of 12 results, 57.14% of the total. (ii) Research Question RQ3 seeks to identify the nature of the products found. According to the taxonomy carried out after the extraction of features, results obtained are shown in Figure 12. It has been found that there is a group that gathers most of the technologies. is group corresponds to Python, with 7 results, representing 33.33% of the total. e next group with the highest results is R, with 28.57% after returning 6 results. After that, Java is placed, representing the 19.05% of total. Next, Apache Spark technology is classified with 3 proposals obtained, 14.29% of the total. Finally, there are two technologies with a single appearance, and they are Node and Ruby, with 9.52% of the total proposals found. Within this research question, it is highlighted that a large amount of proprietary software did not allow to know what technology they are based on so they were included in the category of others. is category turned out to be 14.29% of the results, with 3 proposals.  (iii) Research Question RQ4 locates the main objective of the technology. In this case, two different groups have been stablished: classification and recommendation systems ( Figure 13). In the case of the technologies that offer a classification system, a total of 10 proposals was obtained, representing 47.62% of the technologies implemented. In the case of recommendation systems, 76.19% of the technologies offered a solution to this problem; that is, 16 of the proposals were found. Finally, it is important to note that the 28.57% (6 proposals) of the total use both regression and classification.

Conclusion and Future Work
e development of this research has meant an immersion in the depths of the recommendation and classification systems, presenting a SMS which aims to illustrate the state of the art of these systems nowadays. In addition, with the execution of this study, it has been intended to offer help in decision-making about the algorithms to be implemented in the ADAGIO project.

Scientific Programming
Unlike most SMS, that are focused on the scientific literature, and this study has been carried out from two points of view as discussed throughout the paper: the scientific and the industrial scopes.
A total of 80 primary studies obtained from the main digital libraries were analyzed. Within the scientific field, the results showed that the most studied technique in recommendation systems is recommendation with the use of collaborative filters, closely followed by those that use content-based filters. Only 14 used hybrid recommendation systems, whereas 31 used collaborative filtering and 29 used content-based methods. is is an interesting suggestion for researchers starting to use recommender systems, to find which of them are more popular and more used in the scientific environment. As there are more recommender systems than classification models, it seems that recommendation is well known for scientific researchers, and the most used technique is collaborative filtering.
In the case of classification solutions, the most researched alternatives correspond to naive Bayes, SVM vectors, and neuronal networks, representing almost 55% of the techniques used for this purpose. ese results are due to the great presence of studies oriented to social networks, which cover a large part of Internet traffic.
It is important to point out that all the studies analyzed in the scientific field were found to be of a theoretical nature; i.e., none of them are within the industrial scope. Although many of the proposals present a validation, few of them use real data sources instead of synthetic ones (artificially generated rather than generated by real-world events) to carry out their experiments. In this sense, a lack of technology transfer of these proposals to real case studies has been detected.
Furthermore, by conducting market research through systematic industrial mapping, it was found that there are many technologies that offer automatic learning solutions, and most of which are complete systems or libraries. However, the nature of most of them could not be known because the proprietary software did not allow it. Another important issue that must be highlighted is that not only the communities of free software developers are interested in this topic but also there are large companies that are working on it for commercial purposes. is clearly shows the underlying economic interest, an indicator that it is a branch of long-distance research.
During the execution of the research on this subject, few studies were discovered that offered improvements to specific problems through the combination of recommendation and classification systems, the main motivation for this work. In the literature analyzed, the most interesting solutions, algorithms, and technologies have been found also to be used independently for classification and regression.
is research is not only useful for the researcher trying to use both models at the same time but also for the analysts trying to do just classification or just regression. As future work, a very interesting research line may focus on how to combine these systems to obtain more efficient and effective solutions.
From a software development life-cycle perspective (and avoiding methodological discussions), requirements and analysis phases differ from the design phase because it is an earlier stages and closer to the business (or application model) and is completely technology independent. is SMS shows that the majority of all work carried out in the ML research and industrial field (combining classification and recommendation algorithms) respond to the design and implementation phase but are far from offering solutions in earlier stages such as requirements and analysis. is makes it very difficult to find efficient and effective solutions that support real business needs from an early stage. e present work let justify the opening of new ML research lines to support the information system development since early stages. A hypothetical solution proposal could be to provide business analysts with theoretical frameworks and support tools that facilitate the efficient and effective resolution of problems and that, subsequently, will allow the automation of their design and implementation. Specifically, this solution could consist of the definition of a theoretical framework:

Foundational Knowledge
(i) Archetype Models for the Different Application Domains. is model is used for the conceptualization, formalization, and categorization of the application domains under study. e objective is to understand which application domains exist and which is the basic information structure that should support the application domain. rough the development of these predefined archetype models, information structures could be offered in a systematic way in order to offer support to the different existing problems.

(ii) Classification and Recommendation Template
Methods to be Applied to Archetype Models. is model is used for the conceptualization, formalization, and categorization of ML solutions (combining classification and recommendation algorithms) for all those application domains that have been defined by means of archetype models. e objective is to facilitate the development of a framework that allows the automatic generation of ML solutions and that, in addition, could adjust the classification and the recommendation according to the needs of each application domain. 14 Scientific Programming

Applied Knowledge
(i) From a strategic point of view, understanding the strategy as a set of ordered stages or phases (phase 1: classification and phase 2: recommendation) Define ML solution strategies based on the combination of classification algorithms and recommendation. In other words, determine to what extent and in what manner (iterative and iterativeincremental) the classification and recommendation phases should be combined for a more efficient and effective use of these algorithms in problem solving. In addition, the above strategies may depend on the application domain being studied. Determine which strategic configurations are most appropriate for each application domain. e idea is to facilitate decision-making by automating decisions by entering a particular application domain or problem.
(ii) From a tactical point of view Determine which machine learning methods, techniques, and tools are the most effective and efficient for the application of the previous strategies, determining the most appropriate for each phase (classification and recommendation) according to the application domain of the object of study.
Finally, we can accomplish that even having executed this rigorous study, there is still a big difficulty in deciding about which algorithm is better than another depending on the context in which it is used. ere is no generic classifier or recommender, and several should be implanted depending on the type of data. Currently, it also depends on the desired level of complexity and the cost of misclassification. In conclusion, there is no better model, and everything depends on the characteristics of each problem. In this sense, another possible future work is to characterize these systems, with formal methods (e.g., QuEF [111]), to reduce the cost when making decisions about it.

Conflicts of Interest
e authors declare that they have no conflicts of interest.