Understanding Shilling Attacks and Their Detection Traits: A Comprehensive Survey

The internet is the home for huge volumes of useful data that is constantly being created making it difficult for users to find information relevant to them. Recommendation System is a special type of information filtering system adapted by online vendors to provide recommendations to their customers based on their requirements. Collaborative filtering is one of the most widely used recommendation systems; unfortunately, it is prone to shilling/profile injection attacks. Such attacks alter the recommendation process to promote or demote a particular product. Over the years, multiple attack models and detection techniques have been developed to mitigate the problem. This paper aims to be a comprehensive survey of the shilling attack models, detection attributes, and detection algorithms. Additionally, we unravel and classify the intrinsic traits of the injected profiles that are exploited by the detection algorithms, which has not been explored in previous works. We also briefly discuss recent works in the development of robust algorithms that alleviate the impact of shilling attacks, attacks on multi-criteria systems, and intrinsic feedback based collaborative filtering methods.


I. INTRODUCTION
We live in the information age where there is an overload of information generated by individuals, companies, and governments. The internet has become a common platform for all of this information to be shared and stored. Multiple e-commerce platforms have come into existence, selling all kinds of products and services. With this information overload, it has become increasingly difficult for online users to find content relevant to them. As a means of addressing this problem, many websites are utilizing the recommender system [1]. The recommender system is an information filtering mechanism to provide customers with products/services based on their requirements.
The associate editor coordinating the review of this manuscript and approving it for publication was Pasquale De Meo.
Multiple Recommender System approaches are employed to cater to different kinds of needs in different websites. Over the years, there has been a drastic growth in the methods used to improve recommendation results for different purposes [2]- [8]. Recommendation systems can be broadly classified into two types, content-based [9]- [14] and collaborative filtering-based [15]- [20].
Content-based filtering recommends products to users by comparing the content of the products to the users' profiles. The downside of using content-based filtering is the overspecialization; they tend to recommend only the products that are very similar to what has already been consumed by the user which wasn't the case with collaborative filtering. The collaborative filtering recommender system works by analyzing the past behavior of a user. The key idea is that users with similar behavior have similar needs and interests. Recommendations made using collaborative filtering depend VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ on relationships between the users and items. Unfortunately, due to its openness and dependency on user ratings, collaborative filtering is prone to shilling attack, also known as a profile-injection attack. Shilling attack [21]- [25] is a particular type of attack where a malicious user profile is inserted into an existing collaborative filtering dataset to alter the outcome of the recommender system. The injected profiles explicitly rate items in such a way that the target item is either promoted or demoted. It has been a topic of study for over a decade, and multiple survey papers have covered different parts of this domain. In [26], Mehta et al. focus exclusively on robust collaborative filtering techniques and not on detection techniques or attack strategies. In [27], the types of attacks and the detection techniques discussed are limited. In 2014, [28] produced one of the most comprehensive surveys on the topic, but it presents details on the attacks only until 2011. The survey in [23] focuses only on the statistical measures used in the detection and the basic shilling attack methods. Kaur and Goel [29] perform experimental evaluation comparing the most commonly used shilling attack methods. In [30] and [22], the discussions do not consider the different detection attributes used in supervised and unsupervised detection methods. Both [24] and [25] briefly discuss the various attack and detection methods. There is no discussion on robust algorithms, and the detection methods are not categorized.
This paper aims to be a comprehensive survey of different attack models and detection attributes for shilling attacks on collaborative filtering recommender systems. Since shilling attacks are more prominent in explicit rating systems, this survey's scope is limited to methods that work on explicit rating systems where the user explicitly gives one rating for each item. Shilling attacks are possible in both nearest-neighborbased and matrix factorization-based recommender systems; it is predominantly tested in nearest-neighbor settings, which will be used in our explanations. We explain the collaborative filtering with examples in Sect. 2. Sect. 3 discusses the attack profiles and the various attack models, and Sect. 4 contains the detection attributes. Sect. 5 details the detection algorithm along with the targeted traits which are not discussed in earlier works. We also briefly introduce the impact of shilling attacks on multi-criteria and implicit feedback systems in Sect. 6. Finally, we conclude our paper and give some future directions in Sect. 7.

II. COLLABORATIVE FILTERING
Collaborative filtering uses the user-item interaction data related users and items to make recommendations. It can further be broadly divided into user-based and item-based collaborative filtering. Typically, a user-based collaborative filtering system consists of an m×n matrix with m users and n items. Each element in the matrix represents the ratings given by the user for that item/product. The User-Item matrix, also referred to as the utility matrix, is incomplete as most users would not have rated all the items. Each line of the utility matrix denotes the behavioral history of one user. Consider a system with only two users A and B, who have given similar ratings to products p 1 , p 2 and p 4 . If user B gives a high rating to product p 3 , then p 3 will also be recommended to user A. The process is to find top X similar users to the target user u, then calculate the product ratings for user u based on similar users' ratings. The top N products with high ratings that have not yet been rated by user u are recommended. On the other hand, item-based collaborative filtering functions by forming an item-item matrix to determine the relationship between each pair of items. Here, the recommendations are based on the other items that the user has purchased. For example, consider that multiple users give high ratings to both product p 1 and product p 2 . This causes p 1 and p 2 to have high correlation in item-based CF. If a new user gives a high rating to p 1 , then product p 2 will also be recommended to that user. The ability to work with sparse data and easier maintenance are some of the advantages that item-based CF had over user-based CF. Both these methods are widely used in different recommendation tasks depending on the system's requirements.

III. SHILLING ATTACKS
Shilling attacks can be classified based on intent as a push or nuke attack, where a product is either promoted or demoted, respectively, to gain an economic advantage over competitors. Fig.1 gives an example of the impact of a shilling attack on a recommender system. Here, item X is the target item that is promoted by the shilling attack. Over the years, multiple attack profiles and models have been developed [21], [31]- [36]. Simultaneously, many detection techniques and algorithms have emerged to counter such attacks [37]- [42]. Almost all of the attack models use the same attack profile while generating malicious users. The attack models' differences are attributed to how the individual elements of the attack profiles are formed.

A. ATTACK PROFILE
The attack profile is segmented into four sets: Selected items I s , Filler items I f , Null items I ∅ , and the Target item(s) I t . I t is the set of items or an individual item which needs to be pushed or nuked. I s is the set of items carefully chosen so that a malicious profile has a similarity with the maximum possible number of genuine users. The efficiency of an attack is decided by how many users are recommended with the target item. I s plays a crucial role in attack efficiency. I f is the set of filler items chosen and rated in such a way that the malicious profiles can camouflage with the genuine profiles. I ∅ is the set of items that are not rated by the malicious user [43]. Fig.2 illustrates an attack profile.

1) ATTACK SIZE
The number of injected profiles and the number of items rated per profile considerably influences an attack's reach. The number of injected profiles, also known as attack size, should be large enough to have any impact on the system. Fig.3a shows the increase in the reach of the target item with respect to the attack size. The MovieLens dataset [44] with 100,000 movie ratings from 943 users on 1682 items were used for the generation of this graph. A random attack discussed in the next section, with various attack sizes (1% to 7% of the number of authentic users), was implemented. A movie with an average rating of 1.9 calculated from 31 authentic ratings was chosen as the target item. Before the attack, the target item was not part of the top-40 recommendations made to any of the authentic users using a kNN-based algorithm. Fig. 3a shows the number of users who have the target item in their top-10,20 and 40 recommendations after the attack. The graph shows that the target item reaches more people as the attack size increases. The number of filler items per attack profile was fixed at 2% of the total number of items.

2) FILLER LENGTH
The number of filler items rated per injected profile is known as the filler size. Fig.3b shows the impact of increasing the number of rated items, also known as filler length, on the target recommendation. For the evaluation of this graph, the number of injected profiles was fixed at 3% of total users. From this graph, it can be seen that increasing the filler length can be detrimental to attack efficiency, implying that high filler length can cause the attack profiles to be less similar to authentic users.

B. ATTACKS MODELS
Based on the attackers' motivation and knowledge, multiple attack models have been developed over the years. All these attacks can be categorized either as a high-knowledge attack or a low-knowledge attack. Low-knowledge attacks are more practical and have a higher chance of having a real-world impact, but the efficiency of such attacks is also low. On the other hand, high-knowledge attacks can have a massive effect on Recommender Systems' performance, but they are harder to pull off. From a practical standpoint, an inside job is a viable option to execute a high-knowledge attack, the chances of which are negligible. So, in real-world applications, a moderately efficient low-knowledge attack poses a more significant threat than a highly efficient high-knowledge attack. Based on how the selected items and filler items are chosen, multiple attack models exist which can further be classified as standard or obfuscated, based on the attacks' ability to go undetected.

1) STANDARD ATTACKS
These are the attack models that do not make an exclusive attempt to go undetected in a recommender system. Many detection algorithms have a higher chance of detecting the shilling attack profiles injected using these attacks.
Random Attack [21], also known as the RandomBot attack, is the simplest form of shilling attack. In this model, the items rated by the attack profile are chosen at random except for the target item. The ratings for these items is around the system overall mean. The target item gets the maximum or minimum rating based on whether it is a push or a nuke attack. Some attacks are intended to disrupt the trustworthiness of a recommender system, known as random vandalism [30]. Being the most straightforward attack, it is also the least effective. The purpose of a random attack is usually more effective in disrupting the performance of a Recommender System rather than promoting the target item. The ease of execution of random attacks is because of its low-knowledge requirement. All that the attacker needs are the overall system mean which can be easily empirically calculated. Being the simplest attack, it is not very effective.
Average Attack [21] is similar to the random attack in terms of the item selection process. The randomly chosen items are rated based on the rating distribution of the individual items. Each filler item is assigned the mean rating of that item. This attack is feasible only if the attacker has immense knowledge about the dataset on which the recommender system is built. The effectiveness of this model is proportional to the attacker's knowledge. Though the only difference between random attack and average attack is the filler ratings, the average attack's effectiveness is much better.
Bandwagon Attack [31], [33] is the type of attack where the profiles generated by attackers are filled with popular items with high ratings. The attack profiles are naturally closer to a large number of users. The target item is given the highest rating. This attack can be further divided into bandwagon-random and bandwagon-average depending on the rating scheme used for the filler items. Bandwagon also falls under the low-knowledge attack category since the attacker only needs publicly available data.
Reverse Bandwagon Attack [32], [33] is the exact reversal of a bandwagon attack. This attack is used to nuke the target product by giving low ratings to the items with high negative reviews and giving the least rating to the target item. It is also a low-knowledge attack, just like the bandwagon attack. Though it is highly similar to the bandwagon attack, the efficiency of the reverse bandwagon attack is slightly better.
Segmented Attack [45] targets a specific group of users who are likely to purchase the target item in an e-commerce setup. Segment attacks are usually deployed in item-based collaborative filtering. The rated items and the ratings are based on the attacker's knowledge about the segment. The significant advantage that this method has over other methods is its ability to reach potential customers. For example, if the target item is a book in the science fiction genre, then the selected items will also be from the same genre. Such selection increases the chances of the target book reaching more fans of science fiction. Since the attack is deployed only in a segment of the system, the impact is high.
Probe Attack [46] is not an attack that can be generalized for all systems. Some recommender systems project a predicted rating score for each of the items. The attacker uses this detail to rate the items, enabling it to be similar to other users. The attacker gives genuine ratings to some seed items. Then, when the recommender suggests more items, the attacker forms the rated items list based on these items. This scheme ensures that the attack profiles stay close to its neighbors. It also enables the attacker to learn more about the system.
Love/Hate Attack [32] is a highly effective nuke attack. Here, the attacker randomly chooses filler items and gives them the highest ratings and the least rating to the target item. Despite the simplicity of this model, the effectiveness is surprisingly high. Though it was predominantly designed for nuke attacks, it can also be used for a push attack by altering the ratings. Push attack is not as effective as a nuke attack. Table 1. comprehensively summarizes the differences in various attack models.

2) OBFUSCATED ATTACKS
To go undetected from detection algorithms, attackers try to obfuscate their attack signature. Many models incorporate slight modifications to the standard attack techniques to achieve obfuscation. Fig. 4 shows which of the standard attacks have influenced which of the obfuscated ones. The dotted lines indicate a direct influence between the attacks. The ones that are not derived from specific standard attacks can be incorporated with any standard attack. Though obfuscation might slightly reduce the impact of the attack, it is better than being detected.
Noise Injection [47] adds to a Gaussian distributed random number multiplied by a constant to each rating, for a subset of injected profiles. The degree of obfuscation is dependent on the constant that is multiplied. It can be effectively applied to all of the standard attack methods to obfuscate its signature. Since the rating scheme is affected by noise injection, a slight but observable drop in the attack efficiency can be noticed.
User Shifting [47] is an obfuscation tactic where a subset of the rated item of each injected profile is modified. The ratings of this subset of items are either increased or decreased to reduce the similarity between attack profiles. For different groups of the injected profiles, different subsets of rated items have their ratings modified.
Target Shifting [47] shifts the rating of the target item to one level lesser than the highest possible in push attacks. In nuke attacks, the target rating is shifted to one rating higher than the least possible rating. This strategy is specifically useful in evading the detection methods that penalizes users that give an extreme rating to items. If the target item is already popular, it will be harder to push while employing target shifting obfuscation. In such cases, some other obfuscation methods should be used. [48] is a technique used to obfuscate the Average Attacks. Here, the filler items are chosen from the top X% of the most popular items with equal probability. This method is much more effective than randomly choosing from the entire collection of items. The choice of X influences the detectability of the attack.

Average Over Popular
Mixed Attack [49] is done by using the random, average, bandwagon, and segmented attacks in equal proportions, simultaneously. The detection technique should have the ability to detect all of the standard attacks to be successful. The different attack methods are used to push/nuke the same target item. It helps in evading multiple detection techniques.
Power Item Attack [36], [50] utilizes the power items which are chosen based on three methods. Power items are defined as the set of items that can influence the largest group of items. These items effectively alter the recommendations made for other users. In PIA-AS, the top-N items with the highest aggregate similarity are chosen to be the power items. Such similarity is possible only when a considerable number of users have rated the same two items. In PIA-ID, the In-Degree centrality is the criteria for choosing the power items. The similarity of each pair of items is calculated using weighted significance and the top-N of each item is selected. PIA-NR chooses the items with the highest number of users as the power items.
Power User Attack [36], [50], similar to PIA, chooses the set of users who have the maximum influence on the broadest group of users. In PUA-AS, the top X users with the highest Aggregate Similarity are chosen as the power users. In PUA-ID, the users who participate in the highest number of neighborhoods are selected as power users, based on the In-Degree centrality concept. The power users in PUA-NR are the users with the highest number of ratings in their profile.
SAShA [34] is an attack strategy that uses the semantic features extracted from a knowledge graph to improve the performance of standard CF attack models. A knowledge graph is a structured repository of factual, categorical, and ontological information [51]. This attack works by computing the semantic similarity between the knowledge graph derived features of the target item and all other items in the system. This information is leveraged to generate the most efficient set of filler items.
In [35], Chen et al. describe a method to use both rated item correlation and item popularity to generate malicious users with strong attack ability and similarity to real users. In their approach, each malicious user profile is generated individually. The rated items of a profile are selected based on a matrix of real user profiles.
As soon as the vulnerability of Collaborative Filtering to shilling attacks was discovered, various detection techniques were also constructed. We can broadly classify these techniques into supervised and unsupervised detection techniques. In literature, there is an array of detection attributes that govern these methods.

IV. DETECTION ATTRIBUTES
The attributes that differentiate the shilling profiles from the authentic profiles are considered as the detection attributes. The detection attributes that are designed to work irrespective of the type of attack model are known as Generic attributes.

A. GENERIC ATTRIBUTES
The attributes that are not tailored for specific attack models fall under this category. The efficiency of these attributes alters with the different attack models used. Table. 2 gives the definitions for the symbols used in the explanations below.
Rating Deviation from Mean Agreement (RDMA) is the measure of rating deviation of a user on a set of target items with respect to other users, combined with the inverse rating frequency of these items [37].
Weighted Deviation from Mean Agreement (WDMA) is firmly based on the RDMA attribute. The significant difference of this attribute is that it places high weight for rating deviations for sparse items. WDMA was experimentally found out to give higher information gain [38].
Weighted Degree of Agreement (WDA) captures the cumulative differences of a user's rating of an item from the item's average rating, divided by the number of ratings for the item. WDA is empirically the same as the numerator of the RDMA [38].
Length Variance (LengthVar) measures the difference in the length of a user's profile from the average length of a profile. Here, length denotes the number of items rated by a given user profile. Some attack profiles tend to have too many rated items, deviating substantially from an average user's length [38].

B. MODEL SPECIFIC ATTRIBUTES
The problem with using only the generic attributes is that sometimes it is unable to distinguish malicious profiles from the authentic users, especially when the authentic user exhibit unusual behavior. Attack specific attributes were constructed to overcome these shortcomings. These detection attributes discover the partitions in user profiles so that their behaviors exhibit similarity to one particular attack model. Mean Variance (MeanVar) is used to detect average attacks. It partitions the attack profiles into three parts: the items with extreme ratings (target items), all other rated items in profiles (filler items), and unrated items. This attribute works by computing the mean-variance between all the filler items and the overall average. A low variance would indicate the possibility of an average attack [38].
Filler Mean Target Difference Model (FMTD) targets the segmented attack model. This attribute relies on the difference between ratings of the items in target partition and the items in filler partition [38].
Filler Average Correlation (FAC) focuses on detecting the random attack model. When a random attack is executed, then the ratings given to the items are chosen at random. This attribute calculates the correlation between the ratings in the profile and the average ratings of the items. The correlation is expected to be low for random attacks [39].
Filler Mean Difference (FMD) utilizes the fact that the filler items have a mean rating similar to the overall system average in the random attack model. If the mean ratings are similar, then the user profile could potentially be a random attack profile [39].

V. DETECTION ALGORITHMS
The detection algorithms can be broadly classified into two: Supervised detection methods and Unsupervised detection methods. The supervised techniques require the data to be labeled during the training process, whereas the unsupervised approaches do not. The availability of labeled ground truth is minimal in the recommender system datasets. This downside has led to unsupervised approaches being adopted more than supervised in recent times.

A. TARGETED TRAIT
Most of the detection algorithms work by targeting a particular trait observed in the shilling attacks. Though obfuscation manages to evade detection to some extent, some innate qualities need to be present in an attack, to be effective. Such qualities are usually targeted by the detection algorithms, both in the supervised classification and the unsupervised clustering methods. We briefly discuss what some of those qualities are in this section.

1) USER-BASED TRAITS
The basic division of such detection traits comes from whether the detection algorithm is focusing on finding the attack user profiles or the items. In the user-based trait, the user's behavior is checked for anomalies, which can imply that the profile is fake. 1) Similarity: The similarity of a user profile to a large number of its neighbors is exhibited by most attack profiles. 2) Size: The size of an attack, the number of attack profiles injected, is relatively much smaller than the entire user set. This size difference, combined with the high similarity among them, prove to be useful resources in detection.

2) ITEM-BASED TRAITS
Most of the detection methods rely on the set of items rated by each profile to check if it is a fake profile or not. From a detection point of view, we can categorize the items in an attack profile into 2. 1) Rated Items: Rated items are the items that are used for supporting the push/nuke of the target profile. Both the selected and filler items fall into this category from a detection front. Length: The length of an attack profile, the number of items rated by an attack profile, is usually much higher than an ordinary profile. An attacker usually tries to increase the similarity between the attack profile and many other profiles by rating several filler items.
Rating: The rating given to an item is maintained closer to the average rating of the item to ensure maximum similarity. Detection algorithms usually target such anomalous rating behaviors. 2) Target Item: The target item is the item that is promoted or demoted in an attack.
Crowding: The concentration of users rating a target item will be abnormally high when an attack is executed. Such abnormalities have a sizeable effect on the overall rating of the item.
Rating: The primary reason behind an attack is to modify the opinion about the target item among users. The opinion cannot be altered without giving the target item a high rating in the case of a push attack and the least possible rating in the case of a nuke attack. Usually, such ratings widely deviate from the authentic ratings given to the item.

B. SUPERVISED APPROACHES
The shilling attack problem was treated as a classification problem by Chirita et al. [37], used the RDMA and DegSim as the feature metrics for detecting malicious profiles. The method was developed to detect random and bandwagon attacks. Later on, two more generic metrics, namely WDMA and WDA, were added by Burke et al. [38] to improve the classifier's performance. SVM, kNN, and C4.5 were the most commonly used classifiers for the detection of fake injected profiles. The problem with using the generic attributes was that many authentic users who had extreme behaviors were misclassified as shilling profiles. To overcome this problem, as well as to improve the accuracy of the classifications, attack specific attributes were formulated by [38], [39]. Different attack specific attributes were formed for average, random, segment, and bandwagon attacks.
Williams et al. [52] utilized three strategies to increase the accuracy of detection in the supervised approaches: similarity to reverse-engineered attacks, target concentration, and rating anomaly detection. This detection technique is effective because of the added robustness to the system, but it is highly reliant on the classifier's choice. Their study shows that combining various attributes improves the classifier's performance, especially the support vector machine, and significantly reduces the impact of the most potent attack models. The attributes used in their method are RDMA, WDMA, DegSim, LengthVar, MeanVar, FMD, FAC, and FMTD.
The use of meta-learning was introduced by [53] to improve the precision of the detection. This algorithm can be considered a two-step process where the base-level training is done on attack profiles and available ratings. The second step is to combine the base-level output with the meta-level input for final attack detection. This algorithm had higher precision than previous methods. The diversity of the classifiers reduces the correlation of misclassification, positively impacting the meta-level prediction. They tested their approach against single SVM and voting SVM and experimentally proved to be more effective. The attributes used in their method are WDMA, RDMA, WDA, LengthVar, DegSim, MeanVar, FMD, and FAC. SVM-TIA [54] had supervised, unsupervised, and semi-supervised detection approaches. The pitfall with using the supervised approach was that it needs a balanced data; it means that there should be an equal number of authentic profiles and attack profiles. The accuracy of the supervised approach was lower than their unsupervised approach which involved clustering and statistical methods. It is a two-phase process where rough detecting results are obtained in the first phase by alleviating class imbalance. In the second phase, the potential attack profiles are analyzed to discover the target profiles. Model-specific attributes like FMTD, MeanVar, FAC, and FMD are used in this method.
As mentioned earlier, the imbalance in the data available skewed the outcome of the supervised learning classifiers. AdaBoost was incorporated in [41] to diminish the perturbation caused by the imbalance. The authors first ease the hard classification task by using well designed features for the user profiles. It was achieved by applying weights to the various observations to accentuate the poorly modeled samples. This process was done repetitively to strengthen the correction of misclassification. The attributes used are RDMA, WDMA, WDA, LengthVar, MeanVar, FMTD, and FAC. In addition, they also use attributes that detect filler size with unpopular items.
Hao et al. [55] employed an ensemble detection method on features extracted from ratings, item popularity, and user-user graph. The feature extraction is performed by using Stacked Denoising AutoEncoders and PCA. It automatically extracts user features with different corruption rates. It used a three-stage process involving data preprocessing, feature extraction, and detection using weak classifiers. The novelty of items-the degree of difference between various items-was also used as a feature. Table. 3 explains the different traits used for detection in some of the algorithms. It also discusses the various assumptions based on which the algorithms are built.

C. UNSUPERVISED APPROACH
The initial unsupervised approach introduced by Mehta et al. [56] applied Principle Component Analysis to the profile detection problem. Four factors led to this problem being suitable for PCA: spam users are highly correlated, low deviation from mean rating value, a high similarity with a large number of users, and the assumption that spam users work together. All the user profiles in the recommender system were projected onto a hyperplane formed from the user-item matrix. The user profiles which were clustered closer to the origin of the hyperplane were the attack profiles. The sparsity of the user-item matrix makes it harder for these predictions to be reliable. RDMA and WDMA are also used as detection attributes.
Bryan et al. [57] formulated a generic attribute aiding in the detection of attack profiles in an unsupervised manner. Their approach treats the attack profiles detection problem as an anomalous structure detection problem. The metric used is a variation of the Hv-score metric which was initially used in gene data analysis to aid in locating biclusters. This algorithm, called the UnRAP, seems useful in detecting both standard and obfuscated attacks. Their approach has better chances of catching future novel attack strategies that may escape supervised methods.
Based on the assumption that attack profiles are lesser in number and exhibit high similarity, [49] applied an attribute-based k-means clustering technique. The users were divided into two clusters, and the smaller cluster was identified as attack profiles. This method showcased a higher accuracy and lesser misclassification of genuine users. Irrespective of the attack strategy used, this work claims to have fewer authentic user misclassifications than previous methods. The attributes used include RDMA, WDMA, WDA, and LengthVar, along with the Hv-score metric used in [57].
Chung et al. [58] applied the Beta distribution algorithm to detect attacks. This method detected as many attacks as possible without penalizing the authentic users. Most of the problems associated with this method were inherited from Beta probability distribution itself. The upsides of using this method are its low alarm rate and high detection rate. This method claims to work with sparse data and an unbalanced attack-normal profile ratio. This approach exhibits high performance even with a small attack size and has a low false alarm rate.
Another clustering approach relying on the attack profile similarities was [59], which used k-means clustering to move the fake profiles to the leaf nodes of a binary tree. With the user-item matrix and an optimal number of neighbors N, it recursively uses k-means clustering to cluster the users into two distinct groups. The indexed-cluster centers and intra-cluster correlation of the binary tree are used for attack profile detection. This approach's success rate is particularly high in the average, segment, and bandwagon attack models.
Yang et al. [60] developed an algorithm that focused on analyzing target users and items. It was a two-phase method. First, a density-based clustering method is applied to the dataset based on some selection features to identify malicious users. DBSCAN is used to determine the suspected users based on user features. Second, it spots suspicious items based on adaptive structure learning on selected features and further uses it to capture the attackers. The second phase helps in further scrutinizing the users from the first phase.
Zhang et al. [61] built a clustering approach based on the hidden Markov model (HMM) and hierarchical clustering. The users' rating behaviors are modeled using HMM. Based on the users' preference sequence and modeled rating behavior, each user's suspicion degree is calculated. Then, a hierarchical clustering method is used to group these users based on their suspicion degree into genuine and attack user clusters. They also apply their method on sampled Amazon review dataset to show its effectiveness.
Zhang et al. [62] proposed a method to improve the PCA approach in shilling profile detection. PCA is initially used to separate the profiles into two classes, positive labels for the detected and negative labels for all other users. Then they use the detection features -RDMA, WDMA, WDA, and LenVar -as data complexity features to calculate the CCMeasure of the dataset. CCMeasure is the classification complexity, a quantitative estimate on how difficult it is to classify the dataset. If the measure is high, it indicates that a significant number of authentic users are mislabeled, and the labels are flipped to reduce the data complexity. Table. 4 shows the assumptions, traits, and the downsides of using some of these algorithms.
Having discussed detection techniques, other privacy risks that come with attack detection methods are also studied. Luo and Liang [63] discuss the impact of an insider attack on shilling attack detection for recommendation systems. They consider a possible scenario where an attacker poses as an examiner who is kept from individual rating profiles by secure computations. Their attack model can infer the target rating profile with little prior knowledge and the output of the secure computations. Such an insider attack would pose a serious threat to the privacy of users.

D. DEFENSE AGAINST SHILLING ATTACKS
Parallel to the works focusing on shilling attack detection, there is a line of research intended to create robust algorithms that are immune to shilling attacks. These algorithms do not have a mechanism to find and remove the shilling profiles but can reduce the attack's effectiveness. We briefly discuss some of the recent robust algorithms in this subsection.
Yang et al. [64] combined the soft co-clustering algorithm with the user propensity similarity method to enhance the robustness of the recommender system and detect shilling attacks. It uses Bayesian co-clustering, a soft co-clustering algorithm that allows mixed membership of row and column, highly suitable for real data. This model combines RDMA with soft co-clustering to reduce the influence of shilling attacks. All the attack profiles are clustered into the same cluster, limiting the shilling influence amongst the attack profiles.
Turk and Bilge [65] developed a robust multi-criteria collaborative filtering algorithm. A multi-criteria CF has multiple categories in which the user can rate each item. MCCF helps in better understanding the likes and dislikes of a customer. The robustness in their method is achieved by eliminating suspicious ratings based on the degree of uncertainty. The users are also categorized into different groups based on preference similarities to restrict authentic users from mixing with attack profiles.
Deng et al. [66] integrated entropy scaling into the collaborative filtering process to reduce the impact of over positive and negative users. They also used a minimum threshold to invert the entropy further assisting in the prevention of random attacks.
Alonso et al. [67] calculated a reliability value for each prediction of a user to an item. When an unusual change is observed in the item prediction's reliability value, it indicates a possible shilling attack. They use the Matrix Factorization method to neutralize the impact of a shilling attack. Promoting such shilling predictions can be avoided to reduce the extent of the attack and neutralize the presence VOLUME 8, 2020  of shilling profiles. This method's performance drops with a decrease in the size of the attack, but it is claimed that such a small attack size has a negligible impact.
Current Trends in Shilling Attack Research: Fig. 6 shows the number of publications that came out each year related to shilling attacks. This figure represents both single and multi-criteria rating systems from top conferences and journals covering both supervised and unsupervised methods. The initial stages in shilling attack research focused on creating new attack models to estimate the impact of different attacks on the recommender system. The standard attacks were created in the early 2000s, but the increase in detection techniques during these initial stages led to the research focusing more on obfuscated attacks. The number of papers related to attack models has declined in recent years, and the focus is shifted more towards detection techniques and robust algorithms. This Fig. 6 also shows the gradual growth in detection methods over the years. It is important to note that some of the detection related papers introduce a modified version of a known attack strategy which has a slightly better significant attack impact on the system.

VI. SHILLING ATTACKS IN MULTI-CRITERIA AND IMPLICIT FEEDBACK SYSTEMS
The multi-criteria system aims to find the reason behind a user's opinion about a product [68]. In such recommender systems, the user is asked to rate the same item on multiple categories, such as durability, service, etc. This feature helps in understanding the various aspects of a product. For instance, take the example of a user purchasing an item from an e-commerce website. Assume that the user likes the product but did not like the seller's service. In such cases, having a single-criteria rating system does not efficiently capture the user's thoughts. This issue can be rectified by using a multi-criteria system. In [42], Turk et al. conduct a shilling attack on a multi-criteria based system by modifying random, average, bandwagon, reverse bandwagon, and love/hate attacks to fit the multi-criteria condition. They also introduce an attack method where the most repeated rating is assigned to the filler items, instead of average ratings. They experimentally show that the effect of using such a technique is superior to other methods. The existing literature on multi-criteria shilling attacks is limited.
Implicit feedback based recommender systems rely on a user's behavior, such as click, view, or purchase, to determine the likes and dislikes of the user [69]. The downside of using an explicit rating system is its intrusiveness. Most of the users end up not giving an explicit rating in e-commerce and other recommendation websites. Such cases lead to sparse data, subsequently leading to subpar performance of the recommender. With implicit feedback, the users' interaction with the website can be used to collect relevant data about a user ensuring consistent data collection, which eventually translates into good recommendations. Many sites employ a combination of explicit rating (single and multi-criteria) and implicit feedback systems. Shilling attack is possible in explicit rating systems because of the ease of attack implementation which is not valid with implicit feedback systems. So, a profile injection attack on implicit feedback systems is a possible future direction.

VII. CONCLUSION AND FUTURE WORK
In this survey, first we discuss the different shilling attack types and describe them briefly. Second, we analyze how some of the obfuscated attack models are derived from the standard attacks. Third, we define the various detection attributes which are widely used in multiple detection techniques. Fourth, we interpret and categorize the characteristic traits that are used in the detection process. We then concisely examine the various detection and robust algorithms available. Finally, we also briefly address the impact of shilling attacks on multi-criteria rating systems and implicit rating systems.
In the future, we plan to work on attack possibilities and detection methods for multi-criteria collaborative filtering. Although many shilling attacks and prevention techniques exist for collaborative filtering, there is not enough research related to attacks on the graph-based recommendation system and implicit feedback systems. We will explore the feasibility of extending shilling attacks on these recommender systems.
AGNIDEVEN PALANISAMY SUNDAR received the B.E. degree from the Department of Electronics and Communication, Anna University, Chennai, India. He is currently pursuing the Ph.D. degree with the Department of Computer and Information Sciences, School of Science, Indiana University-Purdue University Indianapolis, Indianapolis, IN, USA. His research interests include security and privacy issues in graph networks, and vulnerabilities in recommender systems. 100 peer-reviewed articles in premier conferences and journals. His research has been supported by the U.S. NSF, the Department of Veterans Affairs, NASA, and industry such as Cisco and Northrop Grumman. His current research interests include applied cryptography, cybersecurity, and deeplearning. He has served as the U.S. NSF Panelist, an NIH Program External Reviewer, a program chair, a member of technical program committees, on editorial boards, and a Reviewer for a number of international journals and conferences.
TIANCHONG GAO (Member, IEEE) received the Ph.D. degree in computer engineering from Purdue University, in December 2019. He is a Lecturer with the School of Cyber Science and Engineering, Southeast University, China. His Ph.D. advisor was Dr. F. Li. He joined the School of Cyber Science and Engineering, Southeast University, in April 2020. He has worked on problems on security, privacy, and social networks. He has published research articles in top-tier conferences and journals. His research vision is to explore the privacy issues in computing and networking.