Pseudolabel Decision-Theoretic Rough Set

,


Introduction
Different from classical rough set and its various generalizations [1][2][3], decision-theoretic rough set (DTRS) [4][5][6] has been demonstrated to be useful in many cost related problems [7][8][9][10][11]. Such model introduces not only the Bayesian decision but also the minimal risk into the construction of lower and upper approximations. Immediately, the thresholds for defining approximations have clear explanations.
From the viewpoint of granular computing [12], binary relation provides us with an effective mechanism for realizing information granulation. Therefore, the model of DTRS is used to characterize the relationship between the result of information granulation (information granules) and decision classes through considering the problem of costs. Consequently, DTRS may contain two important aspects: (1) information granulation and (2) decision costs. On the one hand, different types of information granulations may be used to construct different models [4,[13][14][15][16][17][18][19]. For example, the classical DTRS is formed based on the indiscernibility relation [4]; similar to Pawlak's rough set, such model is only useful in analyzing categorical data; Li et al. [16] have proposed a neighborhood based DTRS, in which the result of information granulation is expressed by neighborhoods of samples; Song et al. [15] have proposed a fuzzy based DTRS, in which the result of information granulation is reflected by fuzzy relation. These representative results are suitable for analysing data with continuous or even mixed values. On the other hand, different problems related to costs also provide us with more directions for studying DTRS [20,21]. For instance, Dou et al. [10] have taken the characteristics of multiplicity and variability of costs into consideration; they proposed three different multicost based DTRS; Liang et al. [22] have introduced the interval-value based cost into DTRS, which enlarges the application range of DTRS; to solve the problem of multiclass decision effectively, Yang et al. [23] have proposed a sequential three-way approach.
From discussions above, it should be emphasized that though different binary relations have been used in constructing DTRS, few of them take the labels of samples into account. Therefore, the ability to discriminate may not perform well. This is mainly because for any two samples their similarity is only determined by attributes or features regardless of the labels of such two samples. Consequently, samples with different labels may meet the requirement of binary relation, which will produce the extra decision costs [5].
What is more, in the studies of neighborhood systems [24][25][26], is that using distance functions is a very common and useful method. For example, Hu et al. [27] proposed a neighborhood based rough set model, which is easy to understand and implement; Li et al. [16] have replaced the equivalence class used in traditional DTRS by neighborhood relation. Correspondingly, we can use the neighborhood relation in this paper. Take a binary classification task as an example (see Figure 1). There are 5 samples in the neighborhood of sample except itself. That is, any one of the samples ( = 1, 2, 3, 4, 5) and satisfies the neighborhood relation. In addition, samples 1 , 2 , and 3 have the label of triangle, while 4 and 5 have the label of square. Obviously, will be misclassified if the majority rule is employed; such misclassification may produce additional cost if the cost sensitivity problem is considered. To reduce this type of the misclassification and the corresponding cost, a natural thinking is to use a smaller scale of neighborhood [28]. However, it does not always work. For example, in Figure 1, if the value of radius keeps reducing, then it is possible that only 2 is in the neighborhood of if the reflexivity is ignored. In this case, will also be misclassified.
Why reducing the value of radius is not always an effective way? This is mainly because the distance between samples is only determined by the condition attributes or features, while ignoring the label information of samples. The lower quality of attributes may not be good enough for providing better discrimination. From this point of view, new technique for measuring the similarity between samples, which considers the information provided by the labels, has become a necessity.
As we all know, in rough set theory, decision attribute offers us the labels of samples. However, these labels cannot be directly used, mainly because the labels over decision attribute are used to generate decision classes for constructing approximations. It is unreasonable to approximately characterize the decision classes by using the information which is derived from the decision attribute itself. For such reason, new sources of the labels of samples should be considered.
Fortunately, motivated by the results of pseudolabel strategy in unsupervised and semisupervised learning tasks [29][30][31][32][33][34], we know that the pseudolabels of samples provide us with another representation of labels. Therefore, it may be a useful attempt for us to introduce the pseudolabel strategy into DTRS. Figure 2 reports the framework of our study. It must be emphasized that our research is based on neighborhood related DTRS. The reason is that neighborhood is suitable for dealing with complex data with simple structure. Furthermore, neighborhood offers us a natural structure of multigranularity with the variation of radii. Obviously, following Figure 2, our main contribution is the pseudolabel strategy. In other words, the neighborhood of sample is determined by not only the binary relation, but also the pseudolabels. Correspondingly, the superiority of the pseudolabel strategy is that it can take the labels of samples into account, which the discrimination of samples can be improved.
The remainder of this paper is organized as follows. The basic knowledge about decision-theoretic rough set and neighborhood rough set is presented in Section 2. In Section 3, the model of pseudolabel neighborhood decisiontheoretic rough set (PLNDTRS) is proposed. To further reduce the decision costs, the problem of attribute reduction is explored in Section 4. In Section 5, not only the experimental comparisons will be presented, but also the effectiveness of our strategy will be analyzed. The paper is ended with conclusions and perspectives for future work in Section 6.

Preliminary Knowledge
. . Decision-eoretic Rough Set. In this section, some basic notions about DTRS [5,16] are presented. Similar to the introduction of other rough sets, the concept of decision system should be firstly described for simplifying the subsequent discussions.
Definition . A decision system can be represented as =< , , >, in which = { 1 , 2 , . . . , } is a nonempty finite set of samples, called the universe; is the set of condition attributes; is the decision attribute. ∀ ∈ , ( ) expresses the label of sample , and ( ) denotes the value that holds on condition attribute (∀ ∈ ). Given a decision system , the classification task will be considered in this paper and then labels of samples are discrete. Consequently, an equivalence relation over can be obtained such that IND( )={( , ) ∈ × : ( ) = ( )}. Immediately, /IND( ) = { 1 , 2 , . . . , } is regarded as the partition determined by IND( ). ∀ ∈ /IND( ), is the set of samples with same label and then it is referred to as the -th decision class. Specially, the decision class which contains sample is denoted by [ ] .
To know what is DTRS, the following notions should be given. Let Ω = { , ∼ } indicate that a sample is in or out of , that is, the states of .   Furthermore, ∀ ⊆ , three actions in A = { , , } indicate belongs to or belongs to possibly or does not belong to . Therefore, the loss function regarding the costs of three actions in two different states is given in Table 1. In Table 1, * ( * ∈ { , , }) denote the costs for taking action , , or when the actual belongs to , and * ( * ∈ { , , }) denote the costs for taking action , , or when the actual does not belong to . Following the costs shown in Table 1 and the conditional probability, the Bayesian costs for three different decisions can be defined as Correspondingly, the following minimum-cost decision rules can be derived: where POS( ) denotes the positive region of , BND( ) denotes the boundary region of , and NEG( ) denotes the negative region of . Samples in POS( ) indicate that these samples belong to determinately, samples in BND( ) indicate that these samples belong to possibly, and samples in NEG( ) indicate that these samples do not belong to .
Since ( In addition, if we assume the following condition for the loss function [5,37]: then we have 0 ≤ < < ≤ 1. In this case, the final decision rules associated with the conditional probability and thresholds can be obtained:   . . Neighborhood Rough Set. Neighborhood rough set is another generalization of classical rough set. There are two main advantages for neighborhood rough set: (1) it is suitable for dealing with continuous data or even mixed data, mainly because the neighborhood relation is constructed based on the consideration of distance between samples; (2) the scale of neighborhood provides us with a flexible technique to measure the granularity [38] and then the structure of multigranularity [39,40] can be naturally formed.
Definition . Given a decision system and a neighborhood radius ∈ (0, 1], ∀ ⊆ , the neighborhood relation in terms of can be defined as [27]: ∀ ∈ ; the neighborhood of is then defined as where Δ is a metric function and the Euclidean distance is a widely accepted form such that Obviously, the neighborhood relation shown in Equation (7) is one of the binary relations; it is reflexive and symmetric. Moreover, it is not difficult to observe that the neighborhood relation produces neighborhood based information granule [41], in which samples are considered similar to the central sample. The size of the neighborhood is determined by the radius . If different values of are used, then different results of neighborhood relations will be generated. Generally speaking, smaller value of will generate finer neighborhood relation while greater value of will generate coarser neighborhood relation.
Definition . Given a decision system , ∀ ⊆ , and ∀ ⊆ , the neighborhood lower and upper approximations of with respect to are defined as The pair [ ( ), ( )] is referred to as a neighborhood rough set of .
Propositions 5 and 6 tell us that if the number of used attributes is increasing, then the size of the neighborhood will be reduced; it follows that the lower approximation will be expanded and the upper approximation will be narrowed.
. . Neighborhood Decision-eoretic Rough Set. One of the main motivations of neighborhood decision-theoretic rough set (NDTRS) is that the classical DTRS can only be used to deal with categorical data. Therefore, to further expand the applications of DTRS, the information granule [ ] in DTRS is replaced by neighborhood in NDTRS; that is, ∀ ⊆ ; the conditional probability is Pro( | ( )) = | ∩ ( )|/| ( )|. Immediately, given a decision system , ∀ ⊆ , and ∀ ⊆ , the neighborhood decision-theoretic based lower and upper approximations of are [16] ( , ) ( ) = { ∈ : Pro( | ( ) ≥ } ; (12) Correspondingly, the positive, boundary and negative regions of are

Pseudolabel Neighborhood Decision-Theoretic Rough Set (PLNDTRS)
Though NDTRS uses the neighborhood relation to replace the indiscernibility relation for realizing information granulation, the derived neighborhoods are still closely related to the distances which are determined by the values of condition attributes or features. However, it is well known that the Euclidean distance does not always perform well for Mathematical Problems in Engineering 5 distinguishing samples, especially in high-dimensional data. Therefore, new techniques have become the necessary issues to be addressed. Following the construction of distance based neighborhood relation, that is, Equation (7), it is not difficult to observe that such strategy does not take the labels of samples into consideration. Therefore, it is possible that different samples with different labels may fall into the same neighborhood. From this point of view, if the labels of samples are used, then such situation may be relieved. Nevertheless, the raw labels of samples in decision system are not suitable for directly using, mainly because we cannot approximate the decision classes generated by decision attribute through using the label provided by the decision attribute itself. For such reason, we will further expand decision system by additional information of pseudolabels. The sources of pseudolabels may be derived from clustering analysis, classification analysis, label propagation [42], and so on.
Formally, a pseudolabel decision system can be represented as =< , , , >, in which , , and are similar to those in decision system , respectively; is the pseudolabel decision attribute, ∀ ∈ , ( ) is the pseudolabel of that is derived from a learning approach over . And the learning approach is -means clustering in this paper.
Since both condition attributes and pseudolabels of samples exist in the pseudolabel neighborhood decision system, we can define the following pseudolabel neighborhood relation for replacing the traditional neighborhood relation shown in Equation (7).
Definition . Given a pseudolabel decision system and a neighborhood radius ∈ (0, 1], ∀ ⊆ , the pseudolabel neighborhood relation is defined as Following Definition 7, ∀ ∈ , the pseudolabel neighborhood of is then defined as Obviously, in Definition 7, two constraints have been employed to construct the pseudolabel neighborhood relation: (1) similar to previous neighborhood relation shown in Equation (7), the distance between two samples should be less than or equal to the given radius; (2) two samples must have same pseudolabel.

Proposition 8. Given a pseudolabel decision system
, Proof. It can be derived directly from Equations (8) and (16).
The above proposition tells us that, by comparing the traditional approach, our pseudolabel strategy can derive a finer neighborhood relation which provides higher performance of discrimination, mainly because the samples with different pseudolabels should be deleted from the neighborhood.
Definition . Given a pseudolabel decision system , ∀ ⊆ , and ∀ ⊆ , the pseudolabel neighborhood lower and upper approximations of are The pair [ ( , ) ( ), ( , ) ( )] is referred to as a PLNDTRS of . Correspondingly, the pseudolabel positive, boundary, and negative regions of are Example . For readers' convenience, we can take an example to show the difference between NDTRS and PLNDTRS. Figure 3 shows us a binary classification problem. Two classes of samples are represented by blue "⋅" and red "+", respectively.
(1) Let us investigate subfigure (a); suppose that is a testing sample for classification; based on the given radius , 8 samples fall into the neighborhood of except itself; they are 1, . . . , 8 . Obviously, samples 1 , 4 , and 6 have the label of blue "⋅" and samples 2 , 3 , 5 , 7 , and 8 have the label of red "+". We then conclude that will be misclassified if the majority rule is employed, mainly because the majority rule classifies sample into the class of red "+" while the real label of is blue "⋅".
Furthermore, if the value of is 0.3, then by Euation (12), we can conclude that belongs to the lower approximations of both two different classes. The reason is that Pro( 1 | ( )) = 0.44 and Pro( 2 | ( )) = 0.56 if 1 indicates the class of blue "⋅" while 2 indicates the red "+". Obviously, it is an inconsistent case and there is a conflict between two different lower approximations.
(2) Let us investigate subfigure (b) which shows us both the real labels and the pseudolabels of samples. The real labels are same to those in subfigure (a); the pseudo labels of samples are denoted by "△" and " ⃝ ", respectively. Obviously, 6 Mathematical Problems in Engineering In addition, if the value of is 0.3, then, by Equation (17), we can observe that belongs to the lower approximation of the class of blue "⋅" while it does not belong to the lower approximation of the class of red "+", mainly because Pro( 1 | ( )) = 0.75 and Pro( 2 | ( )) = 0.25. In other words, there is no conflict between such two different lower approximations.
The above example tells us that our pseudolabel neighborhood approach can not only provide us with better classification performance, but also remove the conflict case between the results of lower approximations.

Proposition 12. Given a pseudolabel decision system and two values
Proof. For ∀ ∈ PLBND ( , 2 ) ( ), by Definition 9, we have Pro( | ( )) > 2 . Since 1 ≤ 2 , we then have Pro( | ( )) > 1 ; it follows that ∈ PLBND ( , 1 ) ( ); that is, The above two propositions suggest that we can also adjust the size of positive region or the boundary region of by modifying the values of and in PLNDTRS.

Attribute Reduction in PLNDTRS
. . Aim of Attribute Reduction. Attribute reduction [43] plays a crucial role in the development of rough set. Different from previous feature selections in the field of Machine Learning, attribute reduction has clear explanations with respect to different constraints. For example, if the positive region is expected to be preserved, then attribute reduction may be defined as a minimal subset of the condition attributes which preserves the positive region.
Up to now, through considering various constraints, a lot of the researchers have proposed so many different definitions of attribute reductions [43][44][45][46][47][48]. It should be noticed that, for DTRS related attribute reductions, decision costs provide us with an interesting topic [11,49,50]. In other words, it is frequently expected that, through using attribute reduction, the obtained subset of the condition attributes may provide lower or minimal value of decision cost. Therefore, in the following of this section, the attribute reduction will be further explored for achieving lower decision cost based on PLNDTRS.
Firstly, given a pseudolabel decision system , let us review how to calculate the decision costs. ∀ ⊆ and suppose that /IND( ) = { 1 , 2 , . . . , }; the total decision cost is where Mathematical Problems in Engineering 7 COST indicates the total decision cost, COST PLPOS indicates the decision cost of positive region, COST PLBND indicates the decision cost of boundary region, and COST PLNEG is the decision cost of negative region.
Similar to other DTRS, the decision costs of our PLNDTRS are not always monotonic if the used condition attributes vary. For example, if the number of used condition attributes has been reduced, then the derived decision costs may increase or decrease. Therefore, our definition attribute reduction will be presented, which aims to achieve the lower decision cost through selecting a subset of the condition attributes.
Example . Table 2 is a small example of pseudolabel decision system which contains 10 samples. All of these samples are described by five condition attributes such that = { 1 , 2 , 3 , 4 , 5 }. The pseudolabels of samples are obtained by clustering technique.
Assuming that = 0.2 and the parameters in loss function are = = 0, = 1, = 0.95, = 0.16, = 0.39, then we can calculate two thresholds such that = 0.7922 and = 0.3305. The decision cost based on the whole condition attribute set , that is, COST , is 1.2983. By computation, the decision cost based on { 3 , 5 } is 0.8633, which is lower than COST . Furthermore, if we remove the attribute 5 , then the decision cost based on { 3 } is 2.0017; if we remove the attribute 3 , then the decision cost based on { 5 } is 1.4983. Obviously, both the decision cost based on { 3 } and the decision cost based on { 5 } are higher than the decision cost based on { 3 , 5 }; that is, neither 3 nor 5 can be removed for generating lower decision cost. Therefore, the condition attributes set { 3 , 5 } is a reduct in Table 2.
. . Algorithm to Compute Reduct. Presently, in the field of rough set, it is well known that obtaining all reducts is an NP-hard problem [51,52]; it follows that many heuristic algorithms for finding reducts have been studied [45,53]. Generally speaking, a heuristic algorithm contains two core aspects: heuristic information and searching strategy.
In the heuristic algorithm, the fitness function can be used to characterize the heuristic information. For example, in Definition 13, the decision cost COST should be used to construct fitness function because the aim of Definition 13 is to find a subset of condition attributes which derives lower decision cost.
With regard to the searching strategies in the heuristic algorithm, two approaches have been considered. One is directional searching and the other is nondirectional searching. The directional searching strategy can be further categorized into deletion method, addition method, and additiondeletion method [54]. Nondirectional searching strategy is usually applied to evolutionary algorithms, swarm algorithms, and other population-based metaheuristic algorithms for optimization problems.
In the context of this paper, we will use the addition method to compute the reduct; that is, the searching strategy is directional searching. To achieve that, the fitness function for measuring the significance of the condition attribute should be presented. The detailed form is shown in the following definition.

Mathematical Problems in Engineering
Inputs: A decision system . Outputs: A reduct .
Step 1: ∀ ∈ , obtain ( ) by -means clustering. // is same to the number of decision classes in .
In the procedure of adding attributes, if COST ≤ COST , then we will stop such procedure and output the final reduct . The detailed process is shown in Algorithm 1 [16,49].
In Algorithm 1, following the basic structure of -means clustering, in Step 1, the time complexity of computing pseudolabels of samples is ( ⋅ ⋅ | |), in which is the number of clusters and is the number of iterations. The time complexity of Step 2 is (| | 2 ⋅ | |) because it is required to compare any two samples in over condition attributes for generating neighborhoods.
Step 4 is the iteration process of adding attributes into the reduct. The time complexity of Step 4 is (| | 2 ⋅ | | 2 ) since, in the worst case, such looping should be executed | | times. To sum up, we can conclude that the whole time complexity of Algorithm 1 is

Experimental Analyses
In this section, to evaluate the performance of PLNDTRS and corresponding algorithm to compute reduct, 15 UCI data sets have been selected to conduct the experiments. The details of these data are shown in Table 3. All the experiments have been carried out on a personal computer with Windows 10, Intel Core i5 3230M CPU (2.60GHz), and 4.00 GB memory. The programming platform is MATLAB R2018a.
In the experiments, -means clustering is employed to produce pseudolabels, and the value of is the number of decision classes. For all the experiments in this section, 10 different values of have been used; they are = 0.03, 0.06, . . . , 0.3. Moreover, not only the 10-fold crossvalidation is employed in our experiments, but also 10 different groups of loss functions are generated randomly for each cross-validation. Therefore, for each radius , each data set will be tested 100 times.

. . Experiments on Raw Attributes.
In what follows, the decision costs obtained by NDTRS and PLNDTRS in terms of raw attributes will be compared. The forms of these decision costs have been shown in Equations (20)-(23). The detailed results are shown in Table 4.
Through a careful investigation of Table 4, it is not difficult to draw the following conclusion.
For all the data sets, the total decision costs generated by PLNDTRS are lower than those generated by NDTRS. This is mainly because (1) most of the decision costs of PLNDTRS based positive region are lower than those of NDTRS based positive region; (2) most of the decision costs of PLNDTRS based boundary region are lower than those of NDTRS based boundary region. From this point of view, by comparing NDTRS, our pseudolabel strategy does work for reducing the decision costs.
. . Experiments on Attribute Reduction. In this subsection, to show the efficiency of the proposed attribute reduction shown in Definition 13, the decision costs derived by the obtained reducts will be compared. Figure 4 shows us the detailed change trend lines of decision costs. It should be noticed that the decision costs shown in Figure 4 are total decision costs instead of the decision costs of three different regions. In each subfigure of Figure 4, the -coordinate pertains to value of , whereas the -coordinate concerns the values of costs.
The detailed legends used in Figure 4 are (1) Raw-Attributes: NDTRS based on the original attributes; (2) NDTRS-Reduct: NDTRS based on the attribute reduction proposed in [16] (similar to our Definition 13, the aim of this attribute reduction is also to find a minimal subset of the condition attributes which can decrease the total decision cost); (3) PLRaw-Attributes: PLNDTRS based on the original attributes; (4) PLNDTRS-Reduct: PLNDTRS based on the attribute reduction proposed in Definition 13.
With a deep investigation of Figure 4, it is not difficult to observe the following.
(1) By comparing with Raw-Attributes, PLRaw-Attributes may offer the lower decision costs. This observation is consistent with what have been addressed in Section 5.1; that is, our pseudolabel approach does decrease the total decision costs.
(2) By comparing with NDTRS-Reduct, PLNDTRS-Reduct also provides us with the lower decision costs. In other words, through attribute reductions, our pseudolabel approach can find subsets of attributes which offer smaller decision costs though the aims of attribute reductions based on NDTRS and PLNDTRS are the same.
(3) In most data sets, both NDTRS-Reduct and PLNDTRS-Reduct do reduce the decision costs. That is to say, both the attribute reductions proposed by Li et al. in [16] and us are effective.
(4) With the increasing of value of , in most data sets, four different types of decision costs will increase. This is mainly because if the value of is greater, then the size of the neighborhood will be larger; it follows that the positive region shows shrinking and the negative region will be expanded. Correspondingly, the decision costs with respect to different regions will be changed.
Furthermore, the decision costs related to three different regions are shown in Table 5.
With a careful investigation of Table 5, we notice that the reducing of total decision costs shown in Figure 4   Furthermore, to further demonstrate the effectiveness of our attribute reduction based on PLNDTRS, we will compare it with other popular attribute reduction methods in DTRS; they are (1) POS-Reduct (positive region extension based attribute reduction [8,14,49]) and NON-NEG-Reduct (nonnegative region extension based attribute reduction [49]). The details are shown in Figure 5.
By Figure 5, it is not difficult to observe the following.
(1) Generally speaking, if the value of increases, then the decision costs obtained by four different reducts increase. In other words, the performances of different reducts are closely related to the scales of radii.
(2) In most cases, the decision costs derived from "PLNDTRS-Reduct" are lower than those derived from "NDTRS-Reduct," "POS-Reduct," and "NON-NEG-Reduct." Take "Yeast" data set as an example; if = 0.12, then the decision cost related to "PLNDTRS-Reduct" is 105 while the decision cost related to "NDTRS-Reduct" is 120; the decision cost related to "POS-Reduct" is 121 and the decision cost related to "NON-NEG-Reduct" is 138. From this point of view, we can say that our pseudolabel strategy is superior to several previous research results about attribute reduction in DTRS.
Finally, we will also compare the lengths of different reducts and the time consumptions for computing these reducts. The details are shown in Tables 6 and 7.
Following Tables 6 and 7, it is not difficult to observe that more time is required to compute PLNDTRS-reduct. The main reason includes two aspects: (1) for each iteration in computing reduct, the pseudolabels of samples should be regenerated; this has been pointed out in (1)       in Algorithm 1; (2) based on Table 6, more attributes are required to construct reducts; such fact indicates that more iterations should be executed. Moreover, it must be emphasized that though the time consumptions of calculating NON-NEG-Reduct and POS-Reduct are lower, such two types of reducts may not be good enough for providing smaller values of decision costs. This fact can be observed in Figure 5 clearly. From this point of view, our pseudolabel strategy based attribute reduction is superior to previous researches though our approach requires more time to obtain reduct.
. . Statistical Comparisons of Reducts. In this section, we will make the statistical comparisons between PLNDTRS-Reduct and NDTRS-Reduct. The Wilcoxon signed rank test [55] will be selected for comparing such two reducts. The purpose of this computation is trying to reject the null hypothesis that the two reducts perform equally well.
For each data set, we have used 10 different radii to obtain reducts; it follows that 10 decision costs will be generated by each algorithm. Take the data "Caffeine Consumption" for instance, the 10 total decision costs derived from NDTRS-Reduct are " 54 Table 8.
Following the results of Table 8, if the significance level is given by 0.05, we therefore reject the null hypothesis. In Table 8, we notice that most of the -values are lower than 0.05. In other words, from the viewpoint of costs, reducts based on NDTRS and those based on our PLNDTRS do not perform equally well though the aim of such two reducts is the same.
Remark . Most of the -values in "costs of negative region" are equal to 1. This is mainly because, based on NDTRS-Reduct and PLNDTRS-Reduct, the decision costs of negative region are 0 in most cases.

Conclusions and Further Perspectives
By considering the label information of samples, a framework of pseudolabel strategy has been introduced into the model of decision-theoretic rough set. Different from the traditional constructions of decision-theoretic rough set, our approach is achieved by not only the distance based neighborhood, but also the pseudolabels of samples. The experimental results have demonstrated that our pseudolabel approach can reduce the decision costs which are closely related to decisiontheoretic rough set. Moreover, the attribute reduction based on our pseudolabel strategy can also provide attributes with better performances if decision costs are taken into consideration.
The following topics are challenges for further research.
(1) Only the -means clustering approach is used to generate pseudolabels of samples; the label propagation or supervised approach will be further explored. (2) Through using pseudolabel strategy, how to design quick process to compute reduct for large scale data [56][57][58] is another interesting topic to be addressed.

Data Availability
The [UCI] data used to support the findings of this study are included within the article.

Conflicts of Interest
The authors declare that they have no conflicts of interest.