The effect of Facebook behaviors on the prediction of review helpfulness

Facebook reviews contain reviews and reviewers’ information and include a set of likes, comments, sharing, and reactions called Facebook Behaviors (FBs). We extend existing research on review helpfulness to fit Facebook reviews by demonstrating that Facebook behaviors can impact review helpfulness. This study proposes a theoretical model that explains reviews’ helpfulness based on FBs and baseline features. The model is empirically validated using a real Facebook data set and different feature selection methods (FS) to determine the importance level of such features to maximize the helpfulness prediction. Consequently, a combination of the impactful features is identified based on a robust and effective model. In this context, the like and love behaviors deliver the best predictive performance. On the other hand, among the baseline features, review linguistic and subjectivity exhibit better predictive performance. Furthermore, we employ different classification techniques and a set of influencer features. The results showed the performance of the proposed model by 0.925 of accuracy. The outcomes of the current study can be applied to develop a smart review ranking system for Facebook product pages.

, Zhou and Guo [2017]. On the other hand, the review subjectivity is explored using the sentiment analysis technique, which has several aspects, such as feature, opinion, and polarity Duan et al. [2012], Verma and Davis [2021]. As for the reviewer's characteristics, they reflect the reviewer's personal information, such as the name, location, and reputation Ngo-Ye and Sinha [2014], Mauro et al. [2021]. Besides, like posts and comments, a Facebook review contains a set of Facebook behaviors (likes, comments, sharing, and reactions) made by other users who read and react to the review. Moreover, in February 2016, Facebook rolled out the reaction feature consisting of five pre-defined reactions, namely "love", "haha", "wow", "sad" and "angry", which enable users to express their emotions wordlessly Smieško [2016]. Several studies have considered these components to analyze posts and comments in many fields Van Hooijdonk and Van Charldorp [2019], Kaur et al. [2019], Mohammad [2016]. These studies revealed how the number of Facebook behaviors helps not only to know more about its users but also to use this piece of information for the benefit of a targeted advertising Van Hooijdonk and Van Charldorp [2019] and improve health care Kaur et al. [2019], education Mohammad [2016] and Political affairs Del Vicario et al. [2017], Lin [2017], Alashri et al. [2018]. However, no study integrating such features has been carried out to improve Facebook review classification (i.e., helpful/not helpful) accuracy. For this reason, our study addresses the following three research questions: • Is there any relationship between the Facebook helpful review and Facebook behaviors?
• What is the most relevant set of features that leads to the most effective helpfulness prediction? • Which Machine Learning (ML) method provides better performance for helpfulness prediction on Facebook? This research aims to investigate the effect of standard features and Facebook behaviors together on Facebook reviews to assess their relative importance. One objective is to compare the combination set (standard features + Facebook behaviors) to the art baseline state. Another objective is to examine the most appropriate set of features that leads to a better helpfulness prediction. There is also a need for an outperforming machine learning model. To fulfill the above objectives, we collected Facebook reviews from 17 well-known cloud providers' official pages. Three feature selection methods are used and evaluated based on their robustness and performance to analyze helpfulness features. Besides, five ML methods are examined to build the helpfulness prediction model on the best effective set of features and the outperformed classifier. Theoretically, we make three contributions to the literature. First, this paper suggests a new model that ensures, for the first time, the assessment of Facebook review helpfulness. The findings indicate the impact of review linguistic and subjectivity feature categories among the other baseline features. Second, we explore the possibility of integrating the number of likes, comments, sharing, and reactions as helpfulness features to improve helpfulness prediction performance. The findings highlight the significant impact of love and like behaviors and the lesser impact of sharing behavior. Third, this study also provides practical implications. The proposed model can be the building block of a smart review ranking system for Facebook product pages to easily get the relevant information. The remainder of this paper is organized as follows: Section II presents the related work followed by Section III, which details the used methodology. Subsequently, Section IV discusses preliminary evaluation results. Section V presents discussions and highlight the implications of this research before concluding remarks and future work in Section VI.

Review helpfulness
Review helpfulness assesses the informativeness of reviews in terms of understanding and evaluating a product or a service Cao et al. [2011], Filieri [2015] In the past two decades, scholars have assessed review helpfulness according to different features considered clues to evaluate the review helpfulness. These features are deduced either from review content, such as readability, visibility, verbs, etc., or from the review/reviewer characteristic, such as productivity score and expertise. More details about helpfulness features are presented in Figure 1 and detailed in Section 3.3.1. To analyze the impact of the proposed features on the review helpfulness, several studies Ren and Hong [2019], Craciun et al. [2020],  applied Tobit regression. Eslami et al. [2018] relied on ANOVA analysis to reveal the impact of each of the features on the helpfulness of online consumer's reviews. On the other hand, in the study of Malik and Hussain [2018], the authors apply the Pearson correlation method to examine the relationship between each feature and review helpfulness. As for the study in Ngo-Ye et al. [2017], the authors used correlation-based feature selection to identify a subset of features that have a high correlation with the review helpfulness. For their part, Ghose and Ipeirotis [2011] analyzed several features of the reviewed text, such as spelling errors, readability, subjectivity, etc. and examined its effect on sales as well. It was discovered that linguistic correctness is a critical factor in affecting sales. On the other hand, compared to very short or very long reviews and getting spelling errors, there is an intuition that the concise and medium reviews and have fewer spelling errors are more useful to the customers. Moreover, Ghose and Ipeirotis [2011] developed three taxonomies for the reviewed text's characteristics, such as the ease of reading a review, spelling errors in the review, and the degree of subjectivity, while taking these points into account. As for Malik and Hussain [2018], they demonstrated the importance of helpfulness per day and syllables and auxiliary verbs as helpfulness features. They explained the high significance of helpfulness per day because a review of a reviewer with large helpfulness per day property attracts more readerships and receives more helpful votes. The syllable variable's great importance indicates that the reviews that contain more syllable words attract more customers and facilitate their purchasing decision making. Besides, customers prefer reviews that use more auxiliary verbs in the text. The authors also proved the high importance of space and productivity score variables. Lee et al. [2018] described the reviewer's features as important predicting factors of review helpfulness since they are the most performing in the classifications. In contrast, the review quality and review sentiment are poor predictors of review helpfulness. Meanwhile, the study of Ngo-Ye et al. [2017] investigated cognitive scripts for review helpfulness. Eslami et al. [2018] used the review length, score, and subjectivity (argument frame) and found that the most helpful reviews are those that are associated with a medium length, lower review scores, and a negative or neutral argument frame. On the other hand, the review title is considered as one of the helpfulness predictors. For instance, the study in  examined the similarity and the sentiment consistency between the review content and its title. The results revealed that the title-content similarity positively impacts the review helpfulness. Moreover, the authors in Ren and Hong [2019], Craciun et al. [2020] figure out that the emotional content is an essential factor in the review perceived helpfulness. Then, the authors in Ren and Hong [2019] indicated that the product type moderates the impact of three discrete emotions (sadness, fear, anger) on review helpfulness. Notably, they revealed the negative effect of the sadness emotion on the review helpfulness, Journal of Data Mining and Digital Humanities ISSN 2416-5999, an open-access journal contrary to the fear emotion that positively impacts. The anger emotion negatively impacts the review helpfulness for an experienced product than for a search product. Meanwhile, the study in Craciun et al. [2020] examined the correlation between the reviewer's gender and contextual emotional tone for the review helpfulness. The authors of this study demonstrated that the reviewer gender manages the impact of emotional content on the review helpfulness and that the readers' perceptions of the reviewer's credibility explain this effect. Their findings also revealed the relationship between female-expressed anger and review helpfulness. On the other side, prediction models are more accurate and can thus be implemented further in practice Lee et al. [2018]. Notably, classification and regression techniques have been used to build prediction models (see Figure 1). Ngo-Ye et al. (2017) stated that the text regression model is used to estimate the helpfulness of the customer review. Consequently, the results showed that the proposed model offers higher accuracy with low training and testing times. As for Malik and Hussain [2018], they analyzed six standard machine learning techniques (NC-PQR, CART, MAR, NNET, RandF, and Stochastic GB). They then revealed that the stochastic gradient boosting ML model is the most effective method and that the proposed hybrid determinants have shown the best performance. Meanwhile, Lee et al. [2018] demonstrated that the RF classification-based algorithm performs the most efficient in predicting helpful reviews from all datasets of TripAdvisor reviews. In Ghose and Ipeirotis [2011], the classifiers SVM and Random Forest (RF) were used to predict the helpful reviews, where RF exceeded SVM in all cases. As for the study of Eslami et al. [2018], the authors used an artificial neural network approach to predict the review helpfulness. A. Lopez and R. Garza performed a topic modeling analysis to extract the main topics that consumers express in their reviews. Thereafter, the topics were used as regressors to predict the number of consumers who found the review helpful to test the serial mediation effect. Moreover, recently, the authors in Li et al. [2022] have empirically examined the effects of the numerical features used in online review comments on perceived review helpfulness and the underlying psychological mechanisms. The main findings of this study highlight the positive correlation between the numerical features in online review comments and perceived review helpfulness across different product categories. This relationship is mediated by two psychological responses of consumers: cognitive elaboration and credibility perception. On the other hand, Lee et al. Lee et al. [2021] proposed several prediction models for the helpfulness of Yelp business reviews using a variety of machine learning techniques, namely multivariate linear regression, random forest, support vector machine regression, and extreme gradient boosting (XGBoost). The results highlighted the outperformance of XGBoost for predicting review helpfulness among selected popular ML algorithms. Results revealed that the reviewer's credibility is an important feature to assess the review's helpfulness.

Facebook behaviors
There are two types of interaction on Facebook: active one such as liking, sharing, commenting, reacting, and passive interaction such as clicking, watching, or viewing Ekström andÖstman [2015], Kaur et al. [2019]. This research focuses on active interactions as they are the available publicly responses. Furthermore, the literature has demonstrated that this interaction type can help understand human behavior on social media Ding et al. [2017], Ross et al. [2018], Kaur et al. [2019]. In this context, Zell and Moeller [2018] showed that the like behavior is the fastest way for users to communicate on Facebook. It enables them to present their agreement on specific comments, pictures, wall posts, statuses, or fan pages. As for Sumner et al. [2018], they revealed that individuals often use the like button to appreciate the content of a post and to get closer with the poster. In the marketing areas, Ding et al. [2017] demonstrated the strong correlation between the like behavior and the box office of movies. Similarly, Pelletier and Horky [2015] revealed a positive impact of the like behavior on the product and service brands as it occurs between the company and the consumer. They also proved that the users were eight times more preferred to hit the like button than the sharing or comment button, with 44% of them liking content posted at least once a day and 29% did so several times a day. The sharing behavior, Rui and Stefanone [2013] proved that there is a strong link between sharing an item and the users' self-presentation. For instance, when presenting themselves on Facebook, the users carefully examine public self-evaluation and check whether online self-presentation is compatible with offline self-presentation.
On the other hand, comments are content that users can write under posted items. In the literature, many studies have proved the importance of considering Facebook comments for many purposes. In the field of crisis communication, Hong and Cameron [2018] confirmed the paramount role of the comments in influencing readers' perception of meaningful discussions.
In the political field, many studies Lin [2017], Alashri et al. [2018] showed the effect of comments on altering voters' opinions in electoral elections. Moreover, The authors in de León and Trilling [2021] explored the relationship between political news valence, Facebook reactions, and news sharing during the 2018 Mexican elections. They used Negative Binomial Regressions Predicting to analyze the relationship between Reactions and news sharing. The findings revealed a strong relationship between negative news and the sad Reaction, as well as between negative news and the Wow Reaction. Besides like, comment and sharing, Facebook extended the like button by adding five reactions (love, haha, wow, sad and angry) to enable consumers to indicate how the content of a post/review makes them feel emotional in one simple click. According to Turnbull and Jenkins [2016], Facebook reactions provide an opportunity for marketers to gain a better understanding of how consumers emotionally engage with social media content. Moreover, many studies used Facebook reactions to improve sentiment classification Kaur et al. [2019], Smieško [2016].

III METHODOLOGY
The present study's methodology for predicting review helpfulness in Facebook consists of six primary phases: data collection, data cleaning, feature engineering, feature selection, helpful Facebook review prediction, and result comparison. Figure 2 depicts the overall workflow adopted in this study.

Data collection
Since 2013, Facebook business pages allow for user-generated ratings or recommendations and reviews on the page. Although most people use Facebook primarily as a way to connect with their social network, rating businesses and reading reviews within such a context have great potential, given that Facebook is the most popular social media platform and the one that most users visit daily Kaur et al. [2019]. Unfortunately, Facebook Graph API 1 does not support the collection of reviews. Indeed, we propose a new crawler that allows the gathering of reviews posted on Facebook using Node.js and MongoDB as a database. The developed crawler collects data from Facebook official pages firstly according to the Mongo model depicted in Figure 3, related to the review as well as to the reviewer such as the review content, date of the review, rating/ recommendation, reviewer (User) profile which contains reviewer information as job position and education level, etc. Moreover, as depicted in Figure 3, the number of likes, reactions, comments, and sharing were also extracted. We extracted Facebook reviews from 17 official pages of cloud providers to evaluate our approach. These pages have an average of 4 reviews per day. The collected reviews were published between September 2019 and February 2022.

Data cleaning
The importance of "quality versus quantity" of data in social media scrapping and analysis cannot be overlooked. Of course, unstructured textual data can be very noisy (i.e., dirty). Therefore, data cleaning (or cleaning, scrubbing) is an important area for social media analytics. The process of cleaning the data, in this paper, consists of removing useless reviews and comments such as: a) Hashtags and URL links b) Non-textual reviews and comments (photo, video, GIF file, etc.) c) Reviews and comments that were less than three words long. d) Reviews and comments that were written in a language other than English. e) Reviews and comments that had five or more misspelled words. Misspelled words were  Figure 3: Structure of the collected data considered as words that have been wrongly spelled either due to human mistakes or typos. f) Special characters (, #, $, etc.). Besides, this phase aims to correct misspelled words using the wordnik API dictionary 2 or to discard the word. We also removed reviews that do not contain any Facebook behavior. In this phase, spam reviews and spam comments are also detected and filtered using Ben-Abdallah et al. [2018]. The proposed approach considers well-known spam features taken from the literature to them we add two new ones: the user profile authenticity to allow the detection of spam review from Facebook and opinion deviation to verify the opinion truthfulness. The output of the data cleaning is a final numberOfReviews of raw data. Of these, approximately 20% (i.e. 5000) were randomly extracted for this study. The 5000 reviews then went through the standard pre-processing phase of POS tagging, tokenization, stop word removal, and stemming.

Data annotation
To annotate the Facebook reviews as helpful or not helpful, we rely on seven cloud instructors from the IT department of the University of Sfax (considered as experts). This annotation is important for the helpfulness prediction of this study. An inter-rater reliability (IRR) analysis using subsamples was conducted where the Krippendorff's alpha Krippendorff [2004] was calculated. The generated IRR sets with 89% agreement.

Feature engineering
This section introduces the features used as clues to identify helpful cloud reviews. Table 1 presents the list of the features used in this study. We define two types of features: independent features and Facebook dependent features. The first type is used to assess any review extracted from any social media platform. Meanwhile, the second one is to evaluate only Facebook reviews, named Facebook Behavioral features.

Helpfulness features
As depicted in Figure 1, the independent features for the review helpfulness prediction process can be either review based or reviewer based. The review based category contains two feature types: content-based feature and meta-data based features. The reviewer based feature category has, in turn, two feature types: profile-based feature and meta-data based feature. In this study, we consider only the review based features because the reviewer based features are not available on Facebook. Three types of review based features are investigated in this research: review quality, review subjectivity, and review characteristic. As for review quality, we consider readability, linguistic, and visibility features.  Hu and Chen [2016], the longer a review has been posted, the higher possibility it will receive votes on helpfulness. Based on this hypothesis, four features are considered in this study: length of the review in characters, an average of the number of syllables per word, length of the review in words, and length of the review in sentences. As for the review subjectivity features, many studies have proved their importance as determinants for review helpfulness prediction Luo and Xu [2019], Malik and Hussain [2020]. In this study, we consider the ratio of comparison and the polarity of the review. This study also considers review characteristic which includes review age and review extremity.

Facebook Behavioral features
Besides the features depicted above, we also exploit Facebook's different behaviors that exist around the review (likes, sharing, comments, and reactions). We had hypothesized that reviews receiving more likes, comments, and sharing would be associated with reporting higher helpfulness. Meanwhile, we will study in this paper the direction of the Facebook reaction effect.

Feature selection
Feature selection (FS) is an appropriate step in building machine learning methods as it enables the predictive model to achieve good, or even better, solutions with a restricted subset of features Saeys et al. [2008]. The FS techniques aim to remove irrelevant and/or redundant features and identify the most relevant features to understand better the subject of interest's mechanisms, instead of merely building a black box predictive model. In the context of the Facebook reviews' helpfulness, the feature importance must be studied to understand a reviewer's behavior that writes helpful reviews on Facebook and, second, to improve the performance prediction by neglecting the irrelevant and the disturbing features. For doing so, we choose to compare three FS methods belonging to different categories, i.e., wrapper method, embedded method, and filter. The use of these methods has led to analyze the most influencer features based on the robust and the performant FS method. The details of the used FS methods are depicted as follows: (1) Random Forest (RF) is a wrapper feature selection method Genuer et al. [2010] that is often used for feature selection in a data science workflow. RF performs an implicit feature selection by creating multiple trees using regression trees, CART Genuer et al. is determined by most of the votes of all trees in the RF. The test error of RF models is estimated on the out-of-bag (OOB) data. After each tree has been grown, the inputs that did not participate in the training bootstrap sample are used as a test set, then averaging over all trees gives the test error estimate. The Gini index uses the decrease of impurity after a node split as a measure of feature relevance. In general, the larger the reduction of impurity after a particular split, the more informative the corresponding input variable. The average decrease in Gini index over all trees in the RF defines the GI. The Gini index is closely related to entropy, both being measures of impurity.
(2) Recursive Feature Elimination (RFE) is an embedded feature selection method that evaluates multiple models by using different procedures. These procedures try to add and/or remove predictors until finding the optimal combination that maximizes the model performance Guyon and Elisseeff [2003].
(3) Analysis of Variance (ANOVA) is a filter-based feature selection method used to assess the means of two or more groups that are substantially different from each other Gueorguieva and Krystal [2004].

Helpfulness facebook review prediction
The techniques used to construct the prediction models were the models of Random Forest (RF), AdaBoost (ADA), Bagging (BAG), Classification And Regression Tree (CART), Iterative Dichotomiser 3 (ID3). These ML techniques are trained and tested for various sorts of experiments using baseline and Facebook behavior features. On the other hand, the function matrix's size is N * K, where K is the number of the most impactful and efficient features, and N is the number of tuples. In the experiment, 100 training sets and 100 test sets were sampled from each of the available data sets in a ten-fold replicated cross-validation Demšar [2006]. Following the test process in Demšar [2006], the data set was sub-sampled to about ten groups: The different ML models were then trained with data from nine of the groups used to test the remaining group. Approximately 90 percent of the randomly selected findings from the total data set were considered for the training phase. In contrast, the remaining 10 percent were considered the test dataset for the comparative methods' output assessment. Moreover, the training and prediction process has been reproducing ten times. Then, the model prediction was validated against each of the ten rounds. By using the real values of review helpfulness, the predictive performance was measured with the ML performance metrics defined in Section 4.1. All the above methods were implemented using Python 3.6.0, a high-level programming language. In particular, we used scikit-learn library Pedregosa et al. [2011] to create and fit our models under Google Colaboratory (also known as Colab) Bisong [2019]. Colab is a cloud service based on Jupyter Notebooks for the dissemination of machine learning and research. It provides a fully configured runtime for machine learning and free access to a robust GPU.

IV EXPERIMENTS AND RESULT COMPARISON
This section presents the series of conducted experiments that aims at assessing the impact of Facebook behavior features on the degree of review helpfulness and the performance of the proposed features compared with the state of the art baseline features. The experiments cover both the helpfulness prediction and feature-wise analyses.

Performance metrics
To evaluate the model performance, the metrics of accuracy, precision, recall and F-measure were considered in the present study. The confusion matrix in Table 2 was used to calculate these metrics as follows: Besides, we rely on the receiver operating characteristic (ROC) to assess the area under the curve (AUC) Hand [2009].

Analysis of extracted reviews
We examined the collected reviews according to the review orientation (positive or negative) 3 . The results of likes and other Facebook reactions can be seen in Figure 4 where it can be observed that around 71% of reviews contain at least one like for both positive or negative reviews.
Perhaps due to the nature of the studied dataset, the usage of the reactions wow and haha do not exceed 9% of the time. In fact, people can agree by choosing like, love or sad if the review is negative or by choosing angry in case of disagreeing with the review opinion. For this reason, it can be observed in Figure 4 that 71% of reviews contain at least one like for both positive or negative reviews. Besides, 58% of reviews include at least one love. For the negative reviews, we found out that 41% of reviews contain at least one angry and 24% of reviews contain sad. Besides, it can also be observed from the figure that people react with negative reviews meaning that negative experiences affect people. Figure 5 also proves the previous hypothesis where 60% of reviews containing comments are negative ones. According to the collected reviews, we discovered that few people share reviews because sharing requires deeper cognitive processing Kaur et al. [2019], Kim and Yang [2017]. Among other fascinating discoveries made was that the longer the review's content material is, the extra likes, comments, and sharing it receives (Figure 4). More probably, it is the length of a review that pushes readers to pay greater attention when reading, which makes them think profoundly about the technique that eventually brings more likes and comments Kaur et al. [2019], Kim and Yang [2017].

Performance analysis of feature selection
As illustrated in Figure 2 and Section 3.4, three feature selection methods are applied to identify the best one in terms of performance. These methods' Feature importance is incorporated in Support Vector Machine (SVM) and distance-based k-Nearest Neighbors (KNN) classifiers. KNN and SVM are two simple and intuitive ones that belong to different families of ML Pathak et al. [2019]. Furthermore, there are often considered as powerful tools to assess the effectiveness of feature importance approaches Neumann et al. [2005]. This subsection is devoted to comparing the performance of the different feature selection methods as RF, ANOVA, and RFE on the Facebook review data set to select the best one. Figure 6 plots the performance of KNN and SVM in terms of accuracy, AUC, and F-measure metrics averaged over 100 runs against different numbers of helpfulness features. It can be observed that RF outperforms ANOVA and RFE by generally achieving the best values overall metrics. This characteristic suggests that RF ranks the features properly. Using KNN the RF method, we can achieve outstanding performance (Accuracy = 0.87) with the top 16 features of the Facebook review data set, while SVM needs 23 features to achieve comparable results (Accuracy=0.85).

Robustness analysis of feature selection
The robustness aims at measuring the sensitivity to changes in the input data: a robust algorithm provides (almost) the same outcome when the original data set is disturbed to some extent, e.g., by adding or removing a given set of instances Saeys et al. [2008]. Besides the model performance, robustness is an essential task for the feature selection process. It verifies the FS algorithm's stability over an unstable one when only small changes are made to the data set.  costly. Therefore, to identify the robustness of the various FS methods, we will focus on what follows on the comparison of feature rankings. The traditional Consistency Index (IC) was used for the top 10 percent of the best rankings over the 100 iterations Kuncheva [2007]. The Consistency Index for the two subsets Si and Sj, such that |Si| = |Sj| is introduced by the Equation 5: where d is the number of features in the data set, k = |Si| = |Sj| and r is the cardinality of the intersection of subsets Si and Sj.
The overall stability of a feature selection algorithm for a set of sequences of features S 1 , S 2 , ..., S K (K = 100 in our case) can be defined as the average overall pairwise consistency indices (Equation 6). The more similar all outputs are, the higher the stability measure will be. Table 3 summarizes the results of the robustness analysis across the Facebook review data set for the different feature ranking methods. ANOVA is the less stable algorithm. RF, on the other hand, proves to be a more common feature selection method. Thus, it seems that RF outperforms other feature selection methods regarding robustness.  It should be noted that robustness should ideally be used in connection with the performance to enhance reliability and performance at the same time. Domain experts are not interested in a strategy that produces very robust features and a poorly performing model. For this reason, we rely on the soundness-performance trade-off (RPT) suggested by Saeys et al. [2008]. The RPT is a harmonious means of robustness and performance aimed at mutually assessing the trade-off bet between robustness and performance as introduced in the Equation 7.
RP T β = (β 2 + 1) × Robustness × P erf ormance β 2 × Robustness + P erf ormance The role of parameter β is to control the relative importance of robustness versus the performance and, therefore, can be used to exert more influence either on robustness or on the performance. On the other hand, the value of β = 1 is the standard formulation that treats both essential robustness and performance. Table 4 shows the results for the three FS ranking algorithms (ANOVA, RF, and RFE) where only 10 percent of the features are used. Therefore, the consistency index was used to measure robustness, while accuracy was used for performance. Moreover, it can be observed that with the use of the data collection, the RF results in a better RPT calculation compared to the other two feature ranking algorithms.

Feature-wise analysis
This section aims at investigating the importance of each feature for helpfulness prediction. The goal of the conducted experiments is to probe the contribution of each feature to the helpfulness of online reviews. The variables ranking applied to the Facebook review dataset using Random forest as a feature selection method is presented in Figure 7.

Impact of baseline features on Facebook review helpfulness
From Table 5 and Figure 7, it can be observed that some linguistic features (W3SoM, Word, Syllab), have a significant influence on helpfulness prediction. The different linguistic variables are appeared in the top 10 selected features using the three FS methods: RF, ANOVA (Adj, Adv), and RFE (Adj). The results echo a simple helpfulness evaluation assumption that "a longer review tends to be more helpful". It also indicates that an experienced customer tends to write a longer review, regardless of whether or not he or she likes the product as these types of reviews appear to be considered more than short reviews. Table 5 illustrates the importance of the review subjectivity (Polarity, Comp) in the helpfulness evaluation. Comp takes the first place according to the RFE method (see Table 5), while polarity exists in the top 10 selected features using RF. The significance of the text review sentiment and polarity suggests that the reviews containing with more words of emotion and sentiment comparatively acquire more helpful votes. Age and extremity are other important factors that caught our attention when evaluating reviews helpfulness. Extremity exists in the top 10 selected features according to the three FS methods. The high impact of extremity illustrates that reviews that have high/low ratings tend to be helpful. Besides, it is noticed that the readability variable (SMOG) also has an important impact on the helpfulness assessment (RF and ANOVA). In fact, this variable aims at estimating the years of education needed to understand the reviews. Table 5

Impact of Facebook behaviors on review helpfulness
From the results depicted in Table 5 and Figure 7, it can be observed the highest importance of both likes and reactions, probably because these features are the most used by consumers when they want to act with a review. In particular, when considering the RF method, the love behavior gained the highest performance (w = 0.1237) compared to other Facebook reactions and the other features. It takes first place in the top 10 selected features (Table 5). Furthermore, as illustrated in Figure 6, the love reaction has achieved outstanding performance results using both classifiers (KNN and SVM) compared to the other features selected by the other FS methods. According to the RF method, the second top-ranked feature is the like behavior, which has a 0.1187 of RF importance weight. This finding demonstrates that the like and love behaviors have a strong relationship to review helpfulness since the difference between their RF importance weights and the other features' importance weights is meaningful. Meanwhile, the sad behavior (w = 0.0663) is the next important FB behavior since it takes fourth place.
Moreover, it appears in the top 10 selected features using RF, ANOVA, and RFE methods. The angry, haha, and wow are the next four influencing behaviors that have lower rankings. Hence, these three reactions do not impact the review helpfulness prediction. We cannot deny the slight importance of the commenting behavior since it appears in the top 10 selected features using ANOVA and RFE. It reveals that customers may prefer reviews that receive more comments. The sharing behavior has the least predictive performance according to the RF selection method as depicted in Figure 7. It may be explained by the fact that the sharing is only done after some form of cognitive persuasion Kaur et al. [2019], Kim and Yang [2017] (1000 reviews that have at least one sharing among 5000 reviews).

Model performance analysis
Throughout this section, a series of experiments are conducted to construct the helpfulness prediction models using five popular machine learning techniques: RF, ADA, BAG, CART, and ID3. According to the RF selection method, the five prediction models are trained and tested using the Facebook review data set and with 23 most important features.  Our proposed helpfulness prediction model is quite effective as it achieves the maximum accuracy of 0.925 and AUC of 0.927 using a real-life Facebook data set. The proposed review and Facebook behavior-based features are established to be useful predictors in improving helpfulness predictive performance based on results presented in Figures 6 and 7 and Table 5. Each feature's importance is computed by relying on different feature selection methods such as Random Forest, ANOVA, and RFE to analyze the impact of baseline features on Facebook review helpfulness. The obtained findings indicate that the review linguistic, subjectivity, and characteristics (extremity and age) are the most significant parameters to determine the review helpfulness in the field of cloud computing as presented in Table 5. Similar findings are also supported by past studies Korfiatis et al. [2012], Wang et al. [2019], Lee and Choeh [2016], Ghose and Ipeirotis [2011]. Then, the next most influencing baseline feature for helpfulness is the extremity. A previous study Lee and Choeh [2016] also showed that products with extreme ratings received more helpful reviews. However, we surprisingly found that the standard readability index does not contribute to any critical feature, and consequently, does not help evaluate the Facebook review helpfulness. However, these results are contradictory with those of the literature Ghose and Ipeirotis [2011], which the used data set may explain (Facebook reviews vs Amazon.com). Besides the baseline features, this study has investigated the relationship between Facebook behaviors, such as likes, comments, sharing, reactions, and the review helpfulness. The obtained results suggested that the likes and reactions are the most significant features to determine the review helpfulness (Table 5). In particular, the love reaction performs better than the other reactions. Moreover, Figure 4 indicates that love is the most used reaction. Indeed, we can endorse the hypothesis that Facebook users often use love reaction to express their agreements about a review. The next most impactful behavior is the like button. Still, even angry and sad reactions have great importance on the review helpfulness prediction as they positively impact prediction when the review is negative but an opposite (negative) impact when it is positive. This finding may be explained by the fact that users usually choose these two reactions to agree on the content in a negative review, and choose them to express their anger about a positive review to indicate that it is not true. Meanwhile, the findings showed that the wow and haha reactions do not affect review helpfulness. The next most significant FB feature is the commenting behavior, which indicates that users are more likely to comment on helpful reviews, which is in line with the previous studies Kaur et al. [2019], Kim and Yang [2017] that demonstrate that users are more likely to comment on posts that have logical information. Surprisingly, we found that the sharing feature does not contribute to any important variable, which implies that the sharing behavior does not significantly help evaluate review helpfulness. Therefore these results are different from those in the literature Kaur et al. [2019], Kim and Yang [2017] maybe because the type of the dataset is different since reviews are used in this research and posts were used in past studies, knowing that users tend to share posts more than reviews.

Implications
The research findings of this research have several significant implications. This study addresses the problem of Facebook review helpfulness assessment to build a practical predictive approach for Facebook review helpfulness. Baseline features are studied on Facebook reviews. Furthermore, we have extended the literature by adding new helpfulness indicators, Facebook behaviors (likes, comments, sharing, and reactions), to make Facebook a source of reviews. In terms of impact levels, the love and like have the highest level, the commenting behavior has a medium level, while and the sharing behavior has the lowest level. The current study raises several practical implications with outcomes that can be applied to develop a smart review ranking system for Facebook product pages. Notably, when a consumer is looking for reviews of a particular product on its official page, the system can automatically identify helpful reviews as per the combination of helpfulness features of the target product. It is a highly desirable feature as product official pages can offer a deeper level of adaptive filtering. On the other hand, since online viewers usually have limited time to read many product reviews, this system can help users quickly grasp relevant information of the selected products and gain time during their online shopping process.

VI CONCLUSION
This study dealt with the problem of helpfulness prediction of Facebook reviews by building a useful model using five machine learning methods, including RF, BAG, CART, ADA, and ID3. The influential features are related to three identified feature categories, such as Facebook behavior, review quality, and review characteristics. In contrast, the hybrid set of features (Facebook behavior + review quality + review subjectivity) delivers the best predictive results. The RF's performance is better than that of the BAG, CART, ADA, and ID3 classification methods. Moreover, the proposed features' predictive performance was also compared to baseline features using the same data set and three FS methods: RF, ANOVA, and RFE. Experimental results showed that the proposed features (Facebook behaviors) outperform the baseline features in predicting review helpfulness using various evaluation metrics. Then, each feature's importance was also examined, and the list of influential features belonging to each category was highlighted. Variable importance measures revealed that love, like, and sad behaviors are the most significant features. Thus, our proposed features are useful indicators for the helpfulness prediction of Facebook reviews. This study has theoretical as well as practical implications. This research contributes to the body of knowledge by examining the effect of facebook behaviors on the helpfulness assessment. The results showed the considerable effect of this type of features to determine helpful reviews. This will help researchers and practitioners understand and explore the online market and the consumers' desire based on facebook behaviors. On the other hand, review platforms could integrate these results in order to encourage consumers to write helpful reviews for example by focusing the linguistic and subjectivity features. However, the present study has some limitations. First, because of the lack of Facebook reviews in cloud computing, this research utilizes only 5000 Facebook reviews to explore and evaluate the proposed variables' contribution to Facebook review helpfulness. Therefore, future endeavors should include other types of products or different brands to enlarge the number of reviews and further enhance the findings presented in this paper. The second limitation of this study is that only English reviews were considered. Meanwhile, non-English reviews may also provide useful consumer opinion information, which should not be neglected. Hence future work could do experiments on other languages such as the Arabic language.