Preference Mining Using Neighborhood Rough Set Model on Two Universes

Preference mining plays an important role in e-commerce and video websites for enhancing user satisfaction and loyalty. Some classical methods are not available for the cold-start problem when the user or the item is new. In this paper, we propose a new model, called parametric neighborhood rough set on two universes (NRSTU), to describe the user and item data structures. Furthermore, the neighborhood lower approximation operator is used for defining the preference rules. Then, we provide the means for recommending items to users by using these rules. Finally, we give an experimental example to show the details of NRSTU-based preference mining for cold-start problem. The parameters of the model are also discussed. The experimental results show that the proposed method presents an effective solution for preference mining. In particular, NRSTU improves the recommendation accuracy by about 19% compared to the traditional method.


Introduction
In recent years, electronic retailers and content providers offer a huge selection of products. It makes the users confused in making decisions among various kinds of products (called items in this paper), such as books, music, and movies. Matching consumers with most appropriate items is important in enhancing user satisfaction and loyalty.
Many researchers have proposed different solutions to discover the preference relations between user and items. The most popular such technology is collaborative filtering (CF) [1]. There are two primary disciplines of CF: the neighborhood approach and latent factor models. A userbased neighborhood [2] approach evaluates the preference of a user by analyzing the historical rating data of neighbors who have similar taste to this user. The similar taste is usually interpreted that these users have rated most of the same items. Items that the neighbors like are then recommended to this user, as he/she will probably also like them. Latent factor models [3] built a two-dimensional matrix to describe the relation about users and items. The elements in the matrix are the ratings given by users. Thus the issue about preference mining translates to the issue about matrix completion. However, the classical CF-based method cannot work when it meets a new user who never rates any item [4]. As a matter of fact, there even are some long-time users who have never rated any of items. It is difficult to recommend an item since there is not any historical rating information about these users to facilitate personalized recommendation. The key reason is that the CF method only relies on the historical rating data. This problem can be called cold-start which is the intrinsic limitation of CF [4].
To deal with the cold-start problem [5], Schein et al. [6] developed a combined method for recommending items that no one has yet rated. Bobadilla et al. [7] proposed improved collaborative filtering to mitigate the new user cold-start problem. The content-based filtering methods [8,9] are also studied for the cold-start problem. However, most of the researchers have addressed the cold-start problem where either the user or the item is new. The situation with both new user and new item has seldom been considered [4]. To cope with cold-start problem when the users and items are new, 2 Computational Intelligence and Neuroscience Min and Zhu [4] pioneered a cold-start recommendation approach for the new situation based rough set on twouniverse model (RSTU). First, the authors considered the features of users' and items' , such as age, gender, and price, to form the equivalence-based information granules. Then they use the definition of approximation operator in rough set theory for generating association rules between users and items. This granule-based recommendation model does not only rely on the rating data. It can work on better ways for dealing with the cold-start problem. However, some problems in [4] still need to be studied. For example, the equivalence-based information granule is only suitable for dealing with nominal data, such as male or female, good or bad. If the attributes are numerical, such as the price of the items, we have to adopt discretization technique to transform the nonnominal to the nominal, which would bring loss of information inevitably. It is obviously unreasonable to measure similarity or dissimilarity with Euclidean distance as to categorical attributes in numerical methods. Furthermore, the usual scoring method contains 5 scales, while the authors. [4] only consider whether or not a user has rated a movie. That is, a user will be labeled "like" if he has rated an item. It means that the user, who gives an item low mark, could be regarded as an admirer of the item. In fact, he takes dislike to the item because of the low-scoring. The key reason is that there is no evaluation about the user's rating baseline in [4]. This is also indistinguishable between positive rules and negative rules as the results.
Thus, so far, little work in this field has succeeded in holding the promise of personalized recommendation because of the following problems: (1) cold-start problems; (2) loss of information by discretization; (3) no distinction between positive rules and negative rules.
Aiming at above problems, the contribution of this paper includes the following: (1) we construct the parametric neighborhood rough sets model on two universes. Users and items are described through neighborhood granules. It can overcome the loss of information by the discretization of the data. (2) The rating table is divided into positive mapping and negative mapping based by using the rating baseline. The neighborhood lower approximation operator is used for defining the preference rules. Then, we can get the positive preference rules which means "like." Some negative preference rules are also mined to be seen as "dislike." (3) The method based on the neighborhood preference rules is proposed. The cold-start problem can be solved well. The experimental results show that our model presents an effective solution for preference mining.
The paper is organized as follows: Section 2 briefly reviews the related work about cold-start issue and rough set theory on two universes. In Section 3, some basic concepts about neighborhood rough set model on single universe and granular computing-based preference mining are briefly reviewed. In Section 4, the data model and the method about baseline evaluation are investigated. In Section 5, we propose the parametric neighborhood rough set model on two universes (NRSTU). Section 6 shows the applications of NRSTU for preference rules mining and recommendation. Numeric experiments are reported in Section 7. Finally, Section 8 concludes the paper.

Related Work
In a real recommender system, users usually present their interest by five numeric scores [10]. CFbased approach is based on the assumption that the users who have similar rating-behaviors are grouped together to help each other make a choice among the potential items. However, CF-based method is unavailable because of the cold-start problem. It occurs when the recommender system is short of ratings. We can distinguish three kinds of cold-start problems: only new user, only new item, and both of new user and new item [7]. [11][12][13]. New user means we face the user who never rates any item. It is hard for a new user to obtain the preferred items since no available usage information can be employed in personalized recommendation [10]. The improved CF-based method such as CF-content or CFdemographic is the common strategy to tackle the new user problem [7]. For example, Braunhofer aimed at solving coldstart problem by using various contextually tagged rating datasets [14]. Leung et al. proposed a new content-based hybrid approach that makes use of cross-level association rules to integrate content information about domains items [15]. Ahn presented a new heuristic similarity measure named PIP that focuses on improving recommendation performance under new user condition [16]. All the former approaches are based on the assumption that the users are new. As a matter of fact, there could also be lots of new items in a real recommender system. [17,18]. The new item problem arises due to the fact that the new items have no initial ratings. New items are not likely to be recommended since there is not any rating information to facilitate personalized recommendation. Simplest solution is to encourage the motivated users who are responsible for rating each new item in the system [7]. In addition, the authors [19] investigated the value of even a few ratings in regard to predictive power. Cremonesi et al. presented two different approaches for building hybrid collaborative content recommender systems to overcome the new item issue [5]. Elbadrawy and Karypis proposed a feature-based similarity model for top-n recommendation of new items [20]. In a word, the new item problem [21] is more often addressed.

New User and New Item.
The new user or new item problem is addressed individually. However, the situation with both new user and new item has seldom been considered. It is very difficult to recommend a new item to a new user since the historical data of the current user and item is unknown. Rough set model so far is the best way to deal with Computational Intelligence and Neuroscience 3 this situation. Min and Zhu studied the cold-start problem by using the rough set model on two universes. They provide a means for describing users and items through information granules, a means for generating association rules between users and items, and a means for recommending items to users using these rules [4]. Min and Zhu explained the concept about granular association rule in detail [22]. Then, some rough set based methods are proposed to solve the new user and new item problem by extending Min work [23,24].

Rough Set Model on Two
Universes. The rough sets theory [25], proposed by Pawlak in 1982, is a powerful mathematical method for the study of incomplete or imprecise information. This theory has been successfully applied to many fields, such as data mining, decision making, pattern recognizing, machine learning, and intelligent controlling [26][27][28][29].
The Pawlak approximation operators are defined by an equivalence relation on the universe. The equivalence relation forms a partition of the universe. Partition or equivalent relation is still restrictive for many applications. To address this problem, Lin firstly proposed the theory of neighborhood system [30][31][32]. The idea about neighborhoods of tolerance is pointed out by Professor Lin. Neighborhood theory can be seen as a breakthrough of rough set. Professor Zhu studied the covering-based rough set [33,34]. He gives the definition of smallest covering. It is a key issue to reduce the redundant information in data mining. Zhu investigated the relationship among basic concepts in covering-based rough sets. This excellent research helps us have a better understanding of covering-based rough set. Neighborhood rough set and cover-based rough set are all meaningful extensions to equivalent relation.
Most of the researches have been conducted on the assumption of the same universe [26][27][28][29]. However, twouniverse model is more appropriate for the preference mining problem. In general, for a certain user, he or she may grade many items. Meantime, it also could include many items rated by some users in a real recommender system. Then an effective method to describe this problem is using two different universes. One of the universes is the set of all users. Another is the set of all the items. Some results have been generated in the rough sets theory on two universes. In [35], the authors gave a general framework of the twouniverse based rough sets model. Shen and Wang defined a variable precision rough sets model based on classical Pawlak model, in which the lower approximation and the upper approximation were generalized to two universes [36]. The concept of the probabilistic rough sets on two universes was firstly defined by Gong and Sun [37]. In [38], the authors focus on the properties of probabilistic rough sets on two universes.
Previous studies are all the extension of Pawlak rough set model on two universes. The equivalence-based information granules in Pawlak rough set are not suitable for dealing with numerical data. It means we have to adopt discretization technique to transform the nonnominal to the nominal for the numerical attributes. The strength of the approach presented in this paper lies in the ability of neighborhood granules to avoid the discretization of the data for the new user and new items problem.

Preliminary
In this section, we first review the model preference mining in the view of granular computing which was proposed in [4,39]. Then, the classical neighborhood rough set model on single universe is also reviewed.

Preference Mining in the View of Granular Computing
Definition 1 (see [40]). Knowledge representation is realized via the information system (IS) which is a tabular form, similar to databases. An information system is pair representation realized via the information system (IS) which is a tabular form, similar to databases. An information system is pair IS = ( , ), where = { 1 , 2 , . . . , } is a nonempty finite set of objects and is a nonempty finite set of attributes.
Definition 2 (see [40]). An equivalence granule can be defined as follows: (1) In an information system, a granule coincides with a concept, which is a basic unit of human thought understood as a pair of intension and extension [41].
The rough sets theory [25], proposed by Pawlak in 1982, is a powerful mathematical method for the study of incomplete or imprecise information. This theory has been successfully applied to many fields, such as data mining, decision making, pattern recognizing, machine learning, and intelligent controlling. The following definitions, which had been defined in the theory of rough set, were employed by Min and Zhu [39].
This data model can be called two-universe model. In the preference mining method [39], can be consider as the set of users and is the set of items where the two entities are connected by the relation R.
The above granular computing model is called classical rough set model on two universes.

Computational Intelligence and Neuroscience
Definition 5 (see [39]). A granular preference rule is an implication of the form In the view of rough set [39], the definition of upper approximation is used for describing the preference rules where ( ) is substituted for the subset ⊆ . The granular preference rule can be interpreted that the users in ( ) like the items in ( ).

Neighborhood
Rough Set on the Single Universe. As the previous sections described, the classical rough set model is unavailable when the datasets are numerical. Hu et al. introduced a neighborhood rough set model in [43] for the heterogeneous data to avoid discretization. The authors considered that the data model is on the single universe. Then, we will give the basic concepts about the neighborhood rough set model on single universe.
Obviously, classical neighborhood rough set model on single universe is not suitable for the user-item data mode structure.

Data Model.
In this study, we illustrate the movie recommendation by making use of MovieLens [44] which is widely used in many preference mining researches (e.g., [4,10,39]). We use the version with 943 users and 1,682 movies. Thus, the items are movies in this study. The database schema is follows.
According to Definitions 1 and 3, and are the set of users and movies. The set and are the features of users and movies, respectively. The specific information can be found in Tables 3 and 4 in Section 7.
The original rating data contains 5 scales showed in Table 1. Reference [4] only considers whether or not a user has rated a movie. That is, it can be established as the mapping relationship R from u 13 to m 219 because that u 13 has rated m 219. An example of set-valued mapping R is given in Table 2. Then we can mine some preference rules from this type of mapping relationship.
As a matter of fact, 1.0 point means u 13 in all probability dislike m 219. In other words, some of such rules are negative rules. The implication of the word "negative" is "dislike" in this study. Obviously, this type of mapping relationship is unreasonable. Therefore, how to establish the appropriate mapping relationship from users to movies is a key issue for preference mining.

Baseline Evaluation.
The rating baseline tends to capture the basic emotions of a user for one item [3]. In this study, Computational Intelligence and Neuroscience 5  it can be considered that a user likes a movie if he (she) gives a higher score than his (her) raging baseline. The simplest computational method of baseline is the overall average rating of movies. For example, the average rating over all movies is 3.7 stars. User Joe can be regarded as an admirer of the movie Avatar if he has given Avatar 4.0 points. However, the authors [3,45] proposed that the user and item bias exist in the real rating system. Some users, for instance, frequently give higher ratings than others. Similarly, some items always receive higher ratings than others. That is to say, the rating baseline should be personalized. Therefore, we use the baseline evaluation method in [45]. A user's rating baseline for a movie accounts for the user and item bias effects.
is the overall average rating. The parameters and indicate the raging bias of user and item , respectively. We also use the example about Avatar to explain the meaning of . Avatar is more popular than an average movie, so it tends to be rated 0.5 stars above the average. On the other hand, Joe is a critical user, who tends to rate 0.1 stars lower than the average. Thus, the baseline estimate for Avatar's rating by Joe would be 4.1 stars by calculating 3.7 − 0.1 + 0.5. It means Joe might not be interested in Avatar if he has given Avatar 4.0 points.
In order to estimate and one can solve the least squares problem [1,3,45]: Here, is all the training samples. The first term ∑ ( , )∈ ( − − − ) 2 strives to find 's and 's that fit the given ratings . The regularizing term (∑ 2 +∑ 2 ) avoids overfitting by penalizing the magnitudes of the parameters. And then, we can take the derivative with respect to and .
The parameter is set to 0.003 in this study. Thus, , , and can be calculated with the above formulas.
We use this baseline evaluation method for building the mapping relationship between the two universes. A user can be considered as an admirer of a movie if he gives a higher rating than the baseline of the movie. The fourth column in Table 1 shows the results by the baseline estimate. Then the set-valued mapping R is rebuilt in the fourth column in Table 2.

Neighborhood Rough Set Model on Two Universes
In this section, we build the neighborhood rough set model on two universes (NRSTU).

Definition 10. Let and be two nonempty finite universes.
( ) is a neighborhood of whose centre is ∈ . R is the set-valued mapping from universe to where R( ) ⊆ . Then, we have the object set R( ( )) ⊆ if ( ) ⊆ . Hence, ( , , R, ) is called neighborhood approximation space on two universes. R( ( )) is the set of the elements in which are mapped from ( ) ⊂ . Proof.
Hence, apr 1 ( ) ⊇ apr 2 ( ). In the practical environment, neighborhood upper approximation is inadequate for describing user preference. For example, we need to know how many items in the granule users like. Here, we proposed a parametric neighborhood rough sets model on two universes where |⋅| is the cardinality of the set.
We have proposed a variable precision neighborhood rough sets model in the previous research [46]. In this study, is the coverage degree of R( ( ))to . It does not mean the precision of an approximation. This is why we call this model parametric neighborhood rough sets instead of variable precision neighborhood rough sets. Definition 14. Let ( , , R, , ) be variable precision neighborhood approximation space on two universes. For any ∈ , the neighborhood uncertainty of is defined as The uncertainty of approximation space is computed as

Preference Rules and Recommendation
In this section, we first define the preference rules by using the neighborhood rough set models on two universes. Then the method of recommendation is also discussed in detail.

Preference Rules.
Min has given the formulation of the preference rule on the view of equivalence granule [4]. In our study, we define a neighborhood granular preference rule by extending Min's work.
Definition 15. Given a neighborhood approximation space on two universes ( , , R, ) and ∈ , ∈ . A neighborhood granular preference rule can be described as follows.
The formulation is consistent with the upper approximation as Definition 11. In an information system, a granule is a basic unit of human thought [41]. In a regular recommended system, such as CF method, the meaning of neighborhood is that the similar people have similar interests [45]. Therefore, the definition of neighborhood upper approximation can be interpreted that some similar users in ( ) like the similar items in ( ).
The rules are usually evaluated through two measures, namely, support degree and confidence degree, which are well defined in [47] for the single universe model. In the twouniverse model, there are user and item support degree of the rules, respectively.
On the other hand, the higher proportion of items in ( ) users like also indicates that the preference rule is stronger. Hence, we also use the third measure called confidence degree for mining the stronger rules.
Thus, the parameter in the parametric neighborhood rough sets model can serve as the confidence degree of the rules. This is why we propose the parametric neighborhood rough sets model in Definition 13. The formulation of preference rules is redefined by using the lower approximation in the parametric neighborhood rough sets model.
Here, the neighborhood lower approximation is employed for the definition of the preference rule. This type of rule can be read as "the users in ( ) like at least % of items in ( )". In our study, we only consider the condition that the users and items are described by all of the features.
A straightforward algorithm for preference rules mining is given by Algorithm 1 which has two steps.
Step 1. Search all neighborhood granules meeting the minimal support threshold of user granule and . This step corresponds to Lines 1 and 2 of the algorithm, where user and item stand for user granules and item granules, respectively.
Step 2. Check all possible rules regarding user and item and output valid ones. This step corresponds to Line 3 through Line 9 of the algorithm.

Preference Rules for Recommendation.
The recommendation method is derived from the idea that the neighbors have the same taste. Given a preference rule ( ) ⇒ ( ) and a new user , we recommend items in ( ) if is a neighbor of .
The performance of a recommender is evaluated mainly by the recommendation accuracy [4]. Formally, let the number of recommended items be and the number of appropriate recommendations ; the accuracy is / [4]. In our study, appropriate recommendation can be interpreted that the real score of the recommended item is higher than the rating baseline of the user. We will elaborate it in detail by the next experiments.

Experiments
In this section, we will evaluate our model through experimentation. In this study, the method of preference mining has improved by two aspects as follows.
(1) NRSTU is proposed to overcome the cold-start problem.
(2) NRSTU enhanced the effectiveness of preference mining by avoiding the discretization and the dividing positive rules and negative rules.
We design three experiments to verify the two points above. First of all, we give an experimental example to show the details of NRSTU-based preference mining. It elaborates how to solve the cold-start problem by using NRSTU. Then, we will discuss the parameters of our model through experimental analysis. To compare the effectiveness of NRSTU, we choose the classical rough set model on two universes (RSTU) as the benchmark. RSTU are employed in [4,39] for preference mining. We download the discretization samples from [48] which are preprocessed by Min as in [39]. This experiment is called NRSTU versus RSTU.

The Meaningfulness of Neighborhood Preference Rules and
Recommendation. Firstly, we look at some rules mined from MovieLens dataset by Algorithm 1. The setting is as follows: = = 0.02 and = 0.3. The training samples percentage and test samples percentage are 90% and 10%, respectively. We can get lots of positive rules form the users who are labeled "like the movies." Similarly, some negative rules are also obtained from the users who dislike those movies. Some of them are listed below as the examples.
(7) user (940) ⇒ movie (192). Tables 3 and 4, we use 8 movies and 7 users to expound the meaningfulness of neighborhood preference rules. For example, user (32) ⇒ movie (1680) means that the users in user (32) like the movies in movie (1680). It can be interpreted that the female students who are about 28 years old like the movies which are labeled drama and romance. Then, we can recommend movies to other users by using the neighborhood preference rule as the Algorithm 2. Suppose u 230 is a new user who has not rated any of the movies. The classical CF-based method cannot work if we do not know any of the historical rating data of u 230. We can get her registration information. Then we will know she is a female student. Hence, we can recommend the movie m 280 to her because of m 280 ∈ movie (1680). As a matter of fact, m 280 is scored 4.0 points by user u 230 where the baseline of u 230 on movie 280 is 2.8 (see Table 1). It means that user u 230  Input Test users, All rules Output The recommendations for the test users (1) for each test users (2) for each rule (3) if is a neighbor of (4) recommend ( ) for . (5) end if (6) end for (7) end for Algorithm 2: Recommendation method.

As in
really enjoys the movie m 280. Here we say it is an appropriate recommendation.
On the other hand, our method also mines some negative preference rules which can solve the problem in [4,39]. The authors only consider whether or not a user has rated a movie. A user will be labeled "like" if he has rated a movie. It means that the user, who gives a movie low mark, could be regarded as an admirer of the movie. That is to say, the negative preference rules (5), (6), and (7) will be regarded as the positive preference rules in the view of [4,39]. For example, as Table 1 in Section 3, suppose u 13 is a new user who is a neighbor of u 937. Movie m 219 will be recommended to u 13 because of m 219 ∈ movie (861). As a matter of fact, m 219 is scored 1.0 point by user u 13 where the baseline of u 13 on movie 219 is 2.3. It means that user u 13, who is a male educator and 47 years old, does not like the horror movie at all. Apparently it is an improper recommendation. This experiment shows the baseline evaluation in an important step in preference mining 7.2. Parameters Discussion. There are four parameters in our model. They are neighborhood metric , confidence degree , support degree , and , respectively. Literature [43] has explained that the result is optimal if threshold is set between 0.1 and 0.2 in the neighborhood system. In our study, threshold is set to 0.15. Then, the selection of , , and is discussed through a series of experiments. We set = from 0.02 to 0.12 with step 0.02 because we cannot get any of rules when = > 0.12. We try confidence degree from 0.05 to 1.0 with step 0.05. For the MovieLens datasets, we randomly divide the samples into 10 subsets and use nine of them as training set and the rest one as the test set. After 10 rounds, we compute the average recommendation accuracy as the final performance.
According to experiment data above, we can obtain some useful conclusions. First of all, Figures 1-6 show that the highest recommendation accuracy always keeps from 45% to 50% if 0 < ≤ 0.55 in most cases. When we increase the numerical value of from 0.45 to 0.55, it concomitantly reduces the number of the rules. However, total and appropriate recommendations do not significantly reduce in this interval. The amount of the rules will drop to 0% if ≥ 0.65. Then there are no longer any of recommendations. That is, we can obtain the highest recommendation accuracy by fewer rules when 0.45 < ≤ 0.55. On the other hand, the amount of total and appropriate recommendations did not significantly reduce when 0.45 < ≤ 0.55 in most cases. It can be concluded that some preference rules are redundant if is set in (0, 0.45]. As a matter of fact, we just need more recommendations and the highest recommendation accuracy by using the minimum amount of rules. In this point of view, 0.45 < ≤ 0.55 is a cost-effective choice for confidence degree . For the support degree and , they only impact on the amount of the rules and recommendations showed in Figures 1(b), 2(b), 3(b), 4(b), 5(b), and 6(b). The recommendation accuracy has nothing to do with and . It is obvious that the higher the support degree we set the less rules and Computational Intelligence and Neuroscience   recommendations we can get. Nonetheless, the high support degree means the strong preference rules. Consequently, it is more reasonable to set the support degree based on actual demand. It depends on which one is more needful between stronger preference rules and more recommendations for a real recommended system. We can even set different values for and on a per-destination basis. Hence, it becomes an open-ended question.

NRSTU versus RSTU.
In this section, we choose RSTU [4] as the benchmark. In [4], the authors proposed a cold-start recommendation approach by using the classical parametric rough set model. The main difference between NRSTU and RSTU is the granules that are structured by the equivalence relation in [4] rather than neighborhood relation. It means that the users (or items) can be sorted into one granule if their features are identical. Actually, the features are rarely exactly alike because of the numerical features such as age and release year. Therefore, it has to adopt discretization technique to transform the nonnominal to the nominal, which would bring loss of information inevitably. In this experiment, the setting is as follows: = = 0.02 and = 0.5. The random recommender, which has an accuracy close to 6.2%, is illustrated for comparison in [4]. As is shown in Table 5, NRSTU yield better performance than RSTU and random recommender obviously. In particular, NRSTU improves the recommendation accuracy by about 19%. There are two reasons for the explanation. The first reason is that RSTU is unreasonable to measure similarity or dissimilarity with Euclidean distance as to categorical attributes in numerical methods. Hu et al. [43] pointed out that there are at least two categories of structures lost in discretization: neighborhood structure and order structure in real spaces. For example, we know the distances between samples and we can get how the samples are close to each other in real spaces. NRSTU addresses this deficiency by the neighborhood granule structure. Furthermore, there is no distinction between positive rules and negative rules in [4]. It is consequent to give some unreasonable recommendations by RSTU which has been discussed in the first experiment. Thus it can be seen that NRSTU is more effective for dealing with the problem of preference mining.

Conclusion and Future Work
In this study, we proposed the parametric neighborhood rough set model on two universes for dealing the problem of preference mining. Firstly, baseline evaluation is investigated for dividing the positive and negative mapping. Furthermore, Computational Intelligence and Neuroscience  the definition of neighborhood lower approximation is proposed to define the user preference rules. The algorithms about preference rules mining and item recommendation are also given. Experiment 1 elaborates how NRSTU can overcome the cold-start problem. It simultaneously shows that baseline evaluation can avoid some unreasonable recommendations. The parameters of NRSTU are discussed in detail in Experiment 2. The last experiment shows that NRSTU improves the recommendation accuracy by about 19%. It can be concluded that NRSTU is more effective for dealing with the problem of preference mining. The future work could move along two directions. First, many other rough set models also can be used to describe the user preference rules, such as kernel rough set and fuzzy rough set. Comparative analysis about the effectiveness of these rough set modes for preference mining is an important issue. Second, the application of our model for dealing with big data is necessary. Consequently, the version of NRSTU within distribute framework requires further attention.