Fuzzy Set-Valued Information Systems and the Algorithm of Filling Missing Values for Incomplete Information Systems

. How to effectively deal with missing values in incomplete information systems (IISs) according to the research target is still a key issue for investigating IISs. If the missing values in IISs are not handled properly, they will destroy the internal connection of data and reduce the efficiency of data usage. In this paper, in order to establish effective methods for filling missing values, we propose a new information system, namely, a fuzzy set-valued information system (FSvIS). By means of the similarity measures of fuzzy sets, we obtain several binary relations in FSvISs, and we investigate the relationship among them. This is a foundation for the researches on FSvISs in terms of rough set approach. Then, we provide an algorithm to fill the missing values in IISs with fuzzy set values. In fact, this algorithm can transform an IIS into an FSvIS. Furthermore, we also construct an algorithm to fill the missing values in IISs with set values (or real values). The effectiveness of these algorithms is analyzed. The results showed that the proposed algorithms achieve higher correct rate than traditional algorithms, and they have good stability. Finally, we discuss the importance of these algorithms for investigating IISs from the viewpoint of rough set theory.


Introduction
e classical rough model [1] can be used to deal with complete information systems. In practice, the lack of some data in IISs [2][3][4][5][6][7][8][9] is inevitable. For example, because the data collection process may be imperfect, human or objective conditions result in data loss or unavailability. For data mining, these missing data may have a very important impact on nal decision. erefore, how to infer unknown information from known information has important theoretical and practical signi cance.
Kryszkiewicz [10] de ned tolerance relation in IISs to investigate IISs by using rough set approach. is tolerance relation assumed that the missing attribute values in IISs could be represented by a set of all possible values of the corresponding attributes from an optimistic perspective. Based on Kryszkiewicz's research, Leung and Li [11] presented a method for obtaining the relative reduction in IISs. Subsequently, Stefanowski and Tsoukias [9] established a new rough set model based on the other relations in IISs.
Authors [8,[11][12][13][14][15][16][17] gave di erent methods to induce binary relations from IISs, and studied IISs by means of rough set theory. ey had two main ways to treat the missing values. One was to delete the missing values, and the other was to take the missing values as generic values.
Based on the probability theory, Yuan et al. [18] lled the missing values in IISs by obtaining the sample that is the closest to the missing data sample in terms of Euclidean distance and correlation. Chen and Shao [19] used the Jackknife variance estimate to investigate the missing values. In addition, there are other methods to handle missing values in IISs. Wang et al. [20] addressed the missing values in IISs by means of the Hop eld neural network approach. Salama et al. [21] proposed a topology method to retrieve missing values in IISs. Clearly, these methods of lling missing values were founded through the other theories, such as, neural network and topology. In this paper, we establish a new method to ll missing values by means of rough set theory. Next, we state the motivation of giving this method. We know that the indiscernibility relation is a basic concept in rough set theory. Given a complete information system, we can establish an indiscernibility relation. Two objects are viewed as indiscernible if they have the same values for each attribute.
erefore, we think that if two objects possess more the same values of attributes, then they have the higher degree of indiscernibility. Based on the observation, we provide a method to fill missing values. By using this method, we can convert the missing values into fuzzy set values by evaluating the relationship between the attribute values of different objects, and then we can transform fuzzy set values into set values or real values according to the principle of maximum membership degree in fuzzy set theory. It is worth noting that, in order to construct this method, we established a new information system, namely, the fuzzy set-valued information system (FSvIS) which plays an important role in the method. e rest of this paper is organized as follows. In Section 2, some basic concepts and notations of rough sets and fuzzy sets are given. In Section 3, we propose the fuzzy set-valued information system (FSvIS), and we induce some binary relations from FSvISs. Furthermore, we investigate the connections between these binary relations. In Section 4, we provide two methods of filling missing values. One is to fill missing values with fuzzy set values, and the other is to fill missing values with set values (or real values). In Section 5, we perform several experiments to analyze the effectiveness of the proposed methods. In Section 6, we apply the proposed methods of filling missing values to investigate IISs. Section 7 concludes this paper.

Basic Concepts and Properties
In this section, we review some basic concepts and notations in rough sets and fuzzy sets.

Basic Concepts for Rough Sets.
In this subsection, we review some basic concepts related to general binary relations and information systems [22][23][24].
Definition 1 (see [23]). A general binary relation on a nonempty set U is a subset of U × U. R is called (1) Reflexive, if for any x ∈ U, (x, x) ∈ R (2) Symmetric, if for any x, y ∈ U, (x, y) ∈ R implies (y, x) ∈ R (3) Transitive, if for any x, y, z ∈ U, (x, y) ∈ R and (y, z) ∈ R imply (x, z) ∈ R Generally, if R satisfies reflexive and symmetric, it is called a similarity relation; if R satisfies reflexive, symmetric, and transitive, then it is called an equivalence relation.
Let R be a general binary relation on U, for x ∈ U, and the successor neighbourhood R(x) of x with respect to R is defined by A triple (U, Att, V) is called an information system, where U is a finite nonempty set of objects called the universe, Att is a finite nonempty set of attributes, and V � ∪ a∈Att V a , where V a called the domain of a is a nonempty set of values of attribute a ∈ Att. If there exist x ∈ U and a ∈ Att such that the value a(x) of x under a is a missing value (a null of unknown value), denoted as " * ," that is, ∃a ∈ Att, * ∈ V a , then the information system is called an incomplete information system (IIS).
In order to investigate the IIS by using rough set approach, Kryszkiewic [13] presented a way to induce a relation in the IIS (U, Att, V) as follows for B ⊆ Att: (2) It is easy to check that T B is reflexive and symmetric, that is to say, T B is a similarity relation on U.
In this paper, we call (U, R) a generalized approximation space, where R is a binary relation on a finite nonempty set U.
Definition 2 (see [1]). Given a generalized approximation space (U, R) and X ⊆ U, the lower approximation and upper approximation of X are defined as follows: In [23], Wang et al. constructed an uncertainty measure in generalized approximation spaces, which is defined as follows: Definition 3. Let (U, R) be a generalized approximation space. e entropy of R is defined as follows: Proposition 1 (see [23]). Let R 1 and R 2 be binary relations on U. If R 1 ⊆ R 2 , then H(R 1 ) ≥ H(R 2 ).

Basic Concepts for Fuzzy Sets.
In this section, we introduce some basic concepts and measures about fuzzy sets. A fuzzy subset A of a nonempty set U is a map from U to [0, 1] [25]. e collection of all fuzzy subsets of U is denoted as F(U). Similarity measure is an important concept in fuzzy set theory, and it is defined as follows: Definition 4 (see [26]). A function S : F(U) × F(U) ⟶ [0, 1] is called a similarity measure on F(U), if S satisfies the following properties: Let U � x 1 , x 2 , . . . x n and A, B ∈ F(U).

Fuzzy Set-Valued Information Systems (FSvISs)
In this section, we replace the real number in the real-valued information system with fuzzy set and propose a more general information system, that is, the fuzzy set-valued information system. It can be seen as a generalization of the probabilistic set-valued information system defined by Huang et al. [29].

Definition 5.
A fuzzy set-valued information system (FSvIS) is a triple (U, Att, F(V)), where U is a nonempty set, Att is a set of attributes, and V is the basic set of attribute values. In addition, for all a ∈ Att and x ∈ U, the value a(x) of x under a is a fuzzy subset of V, that is, a(x) ∈ F(V).
In some cases, if the attribute values are uncertain or missing, then it is reasonable to describe them with fuzzy set values. For example, in IISs, we may fill the missing values with fuzzy set values. In this paper, we will investigate IISs by means of FSvISs. Table 1, a 1 (x 1 ) � (0.23/− 1) + (0.42/0) + (0.76/1) represents the value of the object x 1 under attribute a 1 . (a 1 (x 1 ))(− 1) � 0.23 is the grade of membership of − 1 in a 1 (x 1 ).

e Similarity Relations in FSvISs.
e rough set approach is applied for rule extractions and attribute reductions in information systems. e key problem is how to construct binary relations from information systems. Next, we will establish some similarity relations in FSvISs. en, we establish the relationships between them.
It is well known that, in fuzzy set theory, similarity measure is an important concept to evaluate the similarity degree between fuzzy sets.
ere is a common method to construct binary relation in terms of similarity measure as follows: (8) where ∅ ≠ B ⊆ Att. Clearly, R S λ B is a binary relation on U. e successor neighbourhood of x ∈ U can be computed as follows: In the following section, we limit B ≠ ∅. By (1) of Definition 4, R S λ B is reflexive. In addition, the symmetry of R S λ B is clear. erefore, the following result is obvious.

Proof
(1) We may assume that U � x 1 , x 2 , . . . , x n . Let It is easy to verify that that is, In addition, it is clear that erefore, by equations (11) and (12), we have that us, By By equations (5) and (7), we conclude that S N (A 1 , (5) and (6), we have that where Next, we shall prove that the conclusion is true when n � k + 1. By equation (5) and (6), we only need to prove that that is, For simplicity, we write Hence, we only need to prove that In addition, equation (17) can be written by

Complexity
By equation (21), it is clear that is completes the proof.

□
According to Proposition 4 and equation (8), the following result is obvious. 1]. en, the following statements hold:

e Uncertainty Measures of FSvISs.
In Section 3.1, we establish three similarity relations in FSvISs. If we use the rough set approach to investigate FSvISs, we usually need to choose reasonable similarity relations according to the actual condition. erefore, in this section, we discuss the uncertainty measures of these similarity relations so as to provide evidence for the choice of similarity relations.

Algorithms of Filling Missing Values in IISs
We know that complete information systems can be investigated by the rough set approach. In general, in order to discuss an IIS by means of rough set theory, we need to fill missing values in the IIS. at is to say, we first need to transform the IIS into a complete information system. In this section, we provide some methods to fill missing values in IISs. Note that data are often divided into two types: discrete data and continuous data. Next, we study the issue of filling missing data under two cases.

Algorithm of Filling Missing Values in IISs of Discrete
Data. Clearly, the missing values possess the property of uncertainty; therefore, it is reasonable to use fuzzy set values (or set values) to fill missing values in IISs. In this section, we provide two schemes, namely, replacing the missing values with fuzzy set values and replacing the missing values with set values.

Filling the Missing Values with Fuzzy Set Values.
Next, we provide a method to fill missing values in IISs of discrete data. We replace the missing values with fuzzy set values. In fact, this method can transform IISs into FSvISs.
In the IIS given by Table 2, the value domain of a 1 is L, H, N, * { }, and the value of x 2 under attribute a 1 is the missing value, that is, a 1 (x 2 ) � * . We think that this missing value may be L or H or N. We cannot determine which one is a 1 (x 2 ), but we can find a way to evaluate the degree that L (or H or N) is a 1 (x 2 ). at is, we can replace the missing values with fuzzy sets on L, H, N { }. Next, we outline the main idea of filling missing data. e indiscernibility relation is a basic concept in rough set theory. Given a complete information system, we can establish an indiscernibility relation. Two objects are viewed as indiscernible if they have the same values for each attribute. erefore, we think that if two objects possess more the same values of attributes, then they have the higher degree of indiscernibility. For example, in Table 2, a 1 (x 2 ) � * . x 2 and x 4 have the same values of five attributes ( a 2 , a 3 , a 4 , a 5 , a 6 ); x 2 and x 3 have the same values of two attributes ( a 2 , a 6 ). us, x 2 and x 4 have the higher degree of indiscernibility. at is to say, the possibility degree of a 1 (x 2 ) � a 1 (x 4 ) � H is more than that of a 1 (x 2 ) � a 1 (x 3 ) � N. Based on this observation, we obtain Algorithm 1.

Remark 3. In
Step 2 of Algorithm 1, D(x l , x i ) describes how many attributes for x l and x i have the same value. us, it can be used to characterize the degree of indiscernibility of x l Complexity and x i . In Step 3, |t a k |/|U| can be considered as probability of the elements whose attribute values are t in U. Table 3,

Example 2. In
Step 1: Step 2: We can compute that erefore, we fill the missing value a 1 (x 5 ) with the following fuzzy set:

Filling the Missing Values with Set Values.
Based on the discussion of Section 4.1.1, we can replace a missing value with a fuzzy set. In fact, we can transform the fuzzy set into a set by means of the maximum membership degree law. Let (U, Att, V) be an IIS of discrete data. Assume that a k (x l ) � * , where x l ∈ U and a k ∈ Att. By Algorithm 1, we obtain the fuzzy set a F k (x l ). us, we can use the following set to fill the missing value a k (x l ): us, the maximal membership degree M is 2/3, that is, M � 2/3. By equation (25), we have that a S 1 (x 5 ) � N { }. at is to say, we can fill the missing value a 1 (x 5 ) with the set N { }. In Table 4, we know that a 1 (x 5 ) should be N. is coincides with the filling values by our algorithm.
In Table 3, a 1 (x 9 ) and a 2 (x 7 ) are also missing. By Algorithm 1, we can obtain that

Algorithm of Filling Missing Values in IISs of Continuous
Data. Similar to the discussion of Section 4.1, we investigate the corresponding issues of IISs of continuous data in this section.

Filling the Missing Values with Fuzzy Set Values.
Similar to Algorithm 1, we give Algorithm 2 to fill the missing value in IISs of continuous data.
Example 4. In this example, we discuss the Iris information system given by Table 5 from UCI. Suppose that a 1 (x 5 ) and a 2 (x 7 ) in Table 5 are missing. We obtain Table 6. Next, we use the IIS given by Table 6 to illustrate Algorithm 2.
Step 1: Take 4.3 ∈ V * a 1 . It is easy to compute that 4.3 a 3 , a 4 , and thus D( 3) � 0.25, and (a F 1 (x 5 ))(6.6) � 0. erefore, we fill the missing value a 1 (x 5 ) with the following fuzzy set: into real value by means of the maximum membership degree law. Let (U, Att, V) be an IIS of continuous data. Assume that a k (x l ) � * , where x l ∈ U and a k ∈ Att. By Algorithm 2, we obtain the fuzzy set a F k (x l ). us, we can use the following real value to fill the missing value a k (x l ): By equation (28) Table 5, we know that a 1 (x 5 ) should be 5.1. By Example 5, we fill the value a R 1 (x 5 ) � 5.3 under the assumption that a 1 (x 5 ) and a 2 (x 7 ) are missing. e deviation of a 1 (x 5 ) and a R 1 (x 5 ) � 5.3 is within 0.2. is indicates that the method of filling missing value is effective.

Experiments and Effectiveness Analysis
In this section, we employ several experiments to show the effectiveness of the algorithms given by Section 4. We compare the proposed methods with a representative algorithm. e summary information of experimental datasets is shown in Table 7. Adult dataset and Abalone dataset are taken from UCI (http://archive.ics.uci.edu/ml/datasets.php).

Effectiveness Analysis of the Algorithm of Filling Missing
Values in IISs of Discrete Data. In this part, we will conduct two groups of experiments. ey are used to compare the effectiveness of methods of filling missing values from different points of view. Frequency Estimator-based filling method (Algorithm FE) [30] is a common method of filling missing data. In this section, a comparison of the proposed methods with Algorithm FE is given.
Let (U, Att, V) be an IIS of discrete data. Assume that a k (x l ) � * , where x l ∈ U and a k ∈ Att. V * a k denotes the set V a k − * { }, that is, We shall use a fuzzy set of V * a k to represent the missing value a k (x l ), and we denote the fuzzy set by a F k (x l ). us, ∀t ∈ V * a k , we need to compute the membership degree (a F k (x l ))(t). Next, we establish the steps of filling the missing value a k (x l ) as follows: Step Step 3: Assign a value to ALGORITHM 1: Filling the missing values in IISs of discrete data with fuzzy set values.  Table 3.
Objects  Next, we provide a comparative study of the effectiveness for Algorithms FMvSV and FE. We first give a quantitative index of the effectiveness for filling missing values as follows.
Definition 6. Given a complete information system (U, Att) of discrete data, suppose that the values a k 1 (x w 1 ), a k 2 (x w 2 ), . . . , a k q (x w q ) are missing, and the filling set values are denoted as a S k 1 (x w 1 ), a S k 2 (x w 2 ), . . . , a S k q (x w q ), respectively. en, the correct rate of filling values is defined by where q is the number of the missing values, and Example 6. In Table 4, a 1 (x 5 ) � N, a 1 (x 9 ) � N and a 2 (x 7 ) � L. In Example 3, suppose that a 1 (x 5 ), a 1 (x 9 ), and a 2 (x 7 ) in Table 4 are missing. en, we obtain that us, by Definition 6, we can compute that erefore, the correct rate of filling values is CR � (p 1 + p 2 + p 3 )/3 � 0.778.
In this section, we use some subsets of Adult dataset (see Table 7) to experiment. We need to experiment with the discrete value. We randomly select some subsets of discrete values in Adult dataset. Table 8 gives three subsets of Adult dataset.

Experiment 1.
e effects of experiment times on correct rates of filling values.
In this experiment, we mainly compare the efficiency of Algorithms FMvSV and FE by the dataset AD200 in Table 8. e steps are as follows: (i) 2.5% attribute values are randomly selected from AD200 and supposed that they are missing (ii) By means of Algorithms FMvSV and FE, we can fill these missing values, and we can obtain the correct rate of every algorithm e steps (i) and (ii) are repeated ten times, and the corresponding results are summarized in Table 9. Similarly, we also consider the cases of 5%, 7.5%, 10%, 12.5%, 15%, 17.5%, 20%, 22.5%, 25%, 27.5%, and 30% missing values in AD200. e results are shown in Figures 1 and 2. Figures 1 and 2 show the following facts: (i) e correct rate of Algorithm FMvSV is 10% − 20% higher than that of Algorithm FE. (ii) e number of missing values is given, but the missing values in AD200 are not necessarily the same in each experiment. e correct rates of Algorithm FMvSV have little change in each experiment when the number of missing values in AD200 is a fixed value. However, in the similar case, the correct rates of Algorithm FE fluctuate obviously in each experiment. is indicates that Algorithm FMvSV does well in stability.
In Table 9, the mean value of ten correct rates related to Algorithm FMvSV can be considered as the correct rate of Algorithm FMvSV for 2.5% missing values. e correct rate of Algorithm FE for 2.5% missing values can be obtained similarly. Furthermore, for 5%, 7.5%, 10%, 12.5%, 15%, 17.5%, 20%, 22.5%, 25%, 27.5%, and 30% missing values in AD200, we also compute the correct rates of Algorithms FMvSV and FE. e results are shown in Table 10 and Figure 3. Table 10 and Figure 3 show that the correct rates of Algorithm FMvSV monotone decrease with the increase of Let (U, Att, V) be an IIS of continuous data. Assume that a k (x l ) � * , where x l ∈ U and a k ∈ Att. V * a k denotes the set V a k − * { }. We shall use a fuzzy set of V * a k to represent the missing value a k (x l ), and we denote the fuzzy set by a F k (x l ). us, ∀t ∈ V * a k , and we need to compute the membership degree (a F k (x l ))(t). Next, we establish the steps of filling the missing value a k (x l ) as follows: Step 3: Assign a value to   Table 10 and Figure 3 also indicate that the effect of Algorithm FMvSV is better than that of Algorithm FE. Furthermore, when the missing values are increased to 30%, the correct rate of Algorithm FMvSV still achieves 64%.

Experiment 2.
e effects of data size on correct rates of filling values.
In this experiment, we use AD200, AD400, and AD800 to discuss the effects of data size on correct rates of Algorithms FMvSV and FE. For 5% − 60% missing values, similar to the calculating method of Table 10, we can obtain Input: An IIS (U, Att) of continuous data, where U � x 1 , x 2 , . . . , x n and Att � a 1 , a 2 , . . . , a m . e missing values: a k 1 (x w 1 ), a k 2 (x w 2 ), . . . , a k q (x w q ). e threshold: λ 1 , λ 2 , . . . , λ m . Output: e filling real values:  � a 1 , a 2 , . . . , a m e missing values: a k 1 (x w 1 ), a k 2 (x w 2 ), . . . , a k q (x w q ) Output: e filling set values: a S k 1 (x w 1 ), a S k 2 (x w 2 ), . . . , a S k q (x w q ) (1) for i � 1 to q do (2) for every t in V * for every x in t a k i do (5) Compute the correct rate of Algorithm FMvSV (or FE) in this experiment. e results are shown in Figure 4. Figure 4 reflects the following facts: (i) When the data size increases under the same missing rate, the correct rate of Algorithm FMvSV remains basically the same and is higher than that of Algorithm FE. erefore, for Algorithm FMvSV, we can divide a dataset into several small datasets, and then fill missing values to improve efficiency of it.
(ii) It is easy to see that as the data size increases, the difference between the correct rates of Algorithms FMvSV and FE becomes larger. is illustrates that   10 Complexity as the data size increases, the advantage of Algorithm FMvSV is obvious, that is, Algorithm FMvSV has an advantage in processing big dataset and dynamically increasing dataset.

Effectiveness Analysis of the Algorithm of Filling Missing
Values in IISs of Continuous Data. In this section, we also conduct two groups of experiments. ey are still used to compare the effectiveness of algorithms of filling missing values from different points of view. Mean-based filling method (Algorithm MEAN) [31] is a common method of filling missing data for an IIS of continuous data. Next, a comparison of the proposed methods with Algorithm MEAN is provided. In Section 4.2, we obtain the method of filling the missing values with real values. By combining Section 4.2.1 and Section 4.2.2, we design Algorithm FMvRV to fill the missing values with real values (Algorithm 4).
Next, we provide a comparative study of the effectiveness for Algorithms FMvRV and MEAN. We first give a quantitative index of the effectiveness for filling missing values as follows.
Definition 7. Given a complete information system (U, Att) of continuous data, suppose that the values a k 1 (x w 1 ), a k 2 (x w 2 ), . . . , a k q (x w q ) are missing, and the filling set values are denoted as a R k 1 (x w 1 ), a R k 2 (x w 2 ), . . . , a R k q (x w q ), respectively. en, the correct rate of filling values is defined by  where q is the number of the missing values, and λ 1 , λ 2 , . . ., and λ q are thresholds corresponding to a k 1 , a k 2 , . . ., and a k q .
Example 7. From Example 5, we know that the thresholds are λ 1 � 0.2, λ 2 � λ 3 � λ 4 � 0.5, and a R 1 (x 5 ) � 5.3. Similarly, we can compute that a R 2 (x 7 ) � 3.15. It is clear that 15| � 0.35 ≤ λ 2 � 0.5. erefore, we can calculate the correct rate: CRC � | (1,5), (2, 7) { }|/2 � 2/2 � 1. In this section, we use some subsets of Abalone dataset (see Table 8) to experiment. Table 11 gives three subsets of Abalone dataset.    In this experiment, we mainly compare the efficiency of Algorithms FMvRV and MEAN by AB200 in Table 11. e steps are as follows: (i) 5% attribute values are randomly selected from AB200 and supposed that they are missing (ii) By means of Algorithms FMvRV and MEAN, we can fill these missing values, and we can obtain the correct rate of every algorithm e steps (i) and (ii) are repeated ten times, and the corresponding results are summarized in Table 12. Similarly, we also consider the cases of 10%, 15%, 20%, 25%, and 30% missing values in AB200.
e results are shown in Figure 5. Figure 5 shows that Algorithm FMvRV is more stable than Algorithm MEAN. Furthermore, the correct rate of Algorithm FMvRV is better than that of Algorithm MEAN.
is indicates that Algorithm FMvRV can carry out more accurate forecast of missing values. It is meaningful to explore the correct classification of incomplete datasets.
In Table 12, the mean value of ten correct rates related to Algorithm FMvRV can be viewed as the correct rate of Algorithm FMvRV for 5% missing values. e correct rate of Algorithm MEAN for 5% missing values can be computed similarly. Furthermore, for 10%, 15%, 20%, 25%, and 30% missing values in AB200, we also compute the correct rates of Algorithms FMvRV and MEAN. e results are shown in Figure 6. Figure 6 shows that the correct rates of Algorithm FMvRV monotone almost decrease with the increase of missing values. However, the monotonicity of the correct rates of Algorithm MEAN is not obvious. In addition, Figure 6 also indicates that the effect of Algorithm FMvRV is better than that of Algorithm MEAN. Furthermore, when the missing values are increased to 30%, the correct rate of Algorithm FMvRV is more than 85%. However, now, the correct rate of Algorithm MEAN is less than 60%. is indicates that Algorithm FMvRV is more conducive to predicting the missing values.

Experiment 4.
e effects of data size on correct rates of filling values about continuous data.
In this experiment, we use AB200, AB400, and AB800 to discuss the effects of data size on correct rates of Algorithms FMvRV and MEAN. For 10% − 30% missing values, similar to the calculating method of Table 12, we can obtain the correct rate of Algorithm FMvRV (or MEAN) in this experiment. e results are shown in Figure 7. Figure 7 reflects the following facts: (i) When the data size increases, the correct rate of Algorithm FMvRV is higher than that of Algorithm MEAN. (ii) When the missing values are less than 30%, the correct rates of Algorithm FMvRV are almost unchanged and close to 90%. Now, the data size has little effect on the correct rates of Algorithm FMvRV. is illustrates that Algorithm FMvRV has obvious advantages in processing big dataset when the missing values are less than 30%.

Application of the Algorithms of Filling Missing Values in Investigating IISs
When we apply the rough set approach to investigate an IIS, a key step is to induce a binary relation from the IIS. For an IIS, we can provide three ways to obtain a binary relation from the IIS. Let (U, Att, V) be an IIS and B ⊆ Att. en, the three ways are as follows: (1) By equation (2), we can obtain the binary relation T B .
(2) By Algorithm 1, we can fill the missing values in IISs with fuzzy set values. en we can also view the other values of attributes as fuzzy set values, for example, in Table 3 of Example 2, a 1 (x 1 ) � L, we can see a 1 (x 1 ) as the fuzzy set value a 1 (x 1 ) � (1/L) + (0/ H) + (0/N). Based on this discussion, we can transform an IIS into a FSvIS. us, according to equation (8), we can obtain the binary relation R S λ B . (3) In an IIS of discrete data, if the value a(x) of x under attribute a is not missing, we can view a(x) as a set value a(x) { }. Based on this consideration, we can use Algorithm FMvSV to transform an IIS into a setvalued information system. en, we can obtain the following binary relation [32]: In this section, through a comparative research on these binary relations induced from the same IIS, we further show that our algorithms are meaningful for the studies of IISs. We choose three datasets, i.e., Mammographic dataset, Abalone dataset, and Car dataset, to carry out the comparative research. e summary information of Mammographic dataset and Abalone dataset is shown in Table 13.
Firstly, we introduce a new measure to evaluate the similarity degree between binary relations. Definition 8. Let R 1 and R 2 be binary relations on a nonempty set U. e similarity degree of R 1 and R 2 is defined as Example 8. For the car dataset given by where we take λ � 0.6. Similarly, for every dataset given by . e result is shown in Table 15. Table 15 reflects the following facts. We know that the binary relation induced by a dataset can be considered as the classification result of objects, where the elements in a successor neighbourhood with respect to the binary relation are a class. In this example, for Breast cancer dataset, the similarity degrees between relations are almost close to 1. is means that missing data have less impact on the classification of Breast cancer datasets.
us, we may ignore these missing values in addressing this dataset. In contrast, the relations induced by Car dataset have low similarity degrees.
is shows that missing values in Car dataset play an important role in the classification of this dataset. A natural question is which relation is better to investigate Car dataset. In Table 15, we can see that the similarity degree between T sv B and T B is higher than that between T B and R S λ N B . Furthermore, the similarity degree between T sv B and R S λ N B is higher than that between T B and R S λ N B . is indicates that T sv B is a good choice to be used to investigate Car dataset. Note that T sv B is determined by using Algorithm FMvSV. is illustrates that Algorithm FMvSV is important for the studies of IISs.
At the end of this section, we apply the uncertainty measure to estimate the importance of the proposed algorithm. In Example 8, we list three binary relations T B , T sv B , and R S λ N B with respect to Car dataset. By Definition 3, we can compute their entropies, which are shown in Table 16.
We know that entropy can measure the granularity of a binary. Proposition 1 shows that the finer the binary relation is, the higher the entropy of it is. Conversely, if the entropy of the binary relation is high, then the binary relation should be fine. us, Table 16  can provide more information for the studies of IISs. According to the above discussion, we know that T sv B and R S λ N B are obtained in terms of the proposed algorithms. is illustrates that the proposed algorithms are useful for investigating IISs.
Finally, a similar discussion can also be made about continuous dataset. We omit it here.

Conclusion
is paper established the FSvIS, which is an extension of the PSvIS. By means of the FSvIS, we constructed some algorithms to fill missing values in IISs. We carried out several experiments to analyze the effectiveness of these algorithms. e experiment results indicated that these algorithms are useful to investigate the IISs. ere are still many interesting issues worth studying. First, we will further study the relationship between FSvISs and the existing information systems and study the application of FSvISs. Second, we can apply uncertainty measures for fuzzy relations, which are established by [34], to investigate the fuzzy set-valued information system which is defined by this paper. Finally, we will conduct a more comprehensive analysis of the impact of missing values on IISs.

Data Availability
e data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest.