Rough Set Approach to Incomplete Multiscale Information System

Multiscale information system is a new knowledge representation system for expressing the knowledge with different levels of granulations. In this paper, by considering the unknown values, which can be seen everywhere in real world applications, the incomplete multiscale information system is firstly investigated. The descriptor technique is employed to construct rough sets at different scales for analyzing the hierarchically structured data. The problem of unravelling decision rules at different scales is also addressed. Finally, the reduct descriptors are formulated to simplify decision rules, which can be derived from different scales. Some numerical examples are employed to substantiate the conceptual arguments.


Introduction
As one of the important mathematical tools for granular computing [1,2], the theory of rough set [3] has been demonstrated to be useful in fields such as data mining, knowledge discovery, decision support, machine learning, and pattern recognition.
Pawlak's rough set was proposed on the basis of an indiscernibility relation, which can generate a granulation space on the universe of discourse. Such granulation space is actually a partition since the indiscernibility relation is an equivalence relation. With respect to different requirements, a variety of the expanded rough sets models have been proposed. For example, the tolerance relation [4][5][6][7], similarity relation [8][9][10], characteristic relation [11,12], and neighborhood system [13] based rough sets can be used to deal with the incomplete information systems; the dominance-based rough set approach [14][15][16][17][18] can be used to deal with the multicriteria decision problems; the covering based rough sets [19][20][21][22] are constructed on the basis of a covering, which is an expansion of the partition on the universe; the fuzzy rough set approaches [23][24][25][26] are proposed to approximate the fuzzy concepts in the fuzzy environments; the variable precision rough sets approaches [27][28][29] allow some inconsistency to exist, which can not only solve classification problems with uncertain data but also relax the boundary definition of Pawlak's rough set to improve the suitability.
Obviously, the above rough sets are constructed on the basis of one and only one set of the information granules, which can be generated from a binary relation or a covering. From this point of view, we may call these rough sets the single-granulation rough sets. In single-granulation rough sets, a partition is a granulation space, a binary neighborhood system induced by a binary relation is a granulation space, and a covering is also a granulation space. Nevertheless, it should be noticed that, in [30], the authors said that we often need to describe concurrently a target concept from some independent environments; that is, multigranulation spaces are needed in problem solving. From this point of view, Qian et al. [31][32][33] proposed the concept of the multigranulation rough sets. The main difference between single-granulation and multigranulation rough sets is that 2 The Scientific World Journal we can use multidifferent sets of the information granules for the approximating of target concept. Since each set of the information granules can be considered as a granulation space, then the space induced by multidifferent sets of the information granules is referred to as the multigranulation space. For example, a family of the partitions can be regarded as a partitions based multigranulation space.
Presently, the development of multigranulation rough sets approaches is progressing rapidly. For instance, Qian et al. classified their multigranulation rough sets into two categories: one is the optimistic case and the other is the pessimistic case. Yang et al. [34] generalized the optimistic and pessimistic multigranulation rough sets into fuzzy environment and then proposed the multigranulation fuzzy rough sets models. Furthermore, Qian et al. also proposed a positive approximation [35,36], which can be used to accelerate a heuristic process of attribute reduction. Since the positive approximation uses a preference ordering, which can make the granulation space finer step by step, that is, a finer granulation space can be obtained by last granulation space, then the positive approximation also reflects the thinking of multigranulation. Bittner and Smith [37,38] proposed the concept of granular partition, which provides what may be thought of as hierarchical family of partial equivalence relations. Khan and Banerjee [39] studied the rough set approach to multiple-source information system, which reflects the situation where information arrives from multiple sources. Wu and Leung [40,41] investigated a new knowledge representation system, which is called the multiscale information system. In such system, the data are represented by different scales at different level of granulations, and the granular information is transformed from a finer to a coarser level of granulation.
It must be noticed that the multiscale information system is a very important knowledge representation approach; it can help us to analyze data from the viewpoint of different levels of granulations. For example, maps can be hierarchically organized into different scales, from large to small and vice versa. The smaller the scale, the finer the partition that can be obtained; conversely, the bigger the scale, the coarser the partition that can be obtained. However, what Wu and Leung investigated is complete multiscale information systems. Gore, in his influential book Earth in the Balance [42], notes that "We must acknowledge that we never have complete information. Yet we have to make decisions anyway. " This quote illustrates not only the difficulty of making decisions about environmental issues but also the fact that making such decisions with partial information is ultimately inevitable. Therefore, the investigation of incomplete multiscale information system has become a necessity. Different from the complete multiscale information system, since unknown values are existing in incomplete multiscale information system, then the obtained granulation space at each level of granulation is not necessarily a partition but a covering. To solve such problem, Wu and Leung's approach to multiscale information system has to be reexamined in incomplete multiscale information system. This is what will be discussed in our paper.
In the next section, we first introduce some basic notions related to Pawlak's rough set and multiscale information system. The incomplete multiscale information system and rule induction problem are explored in Section 3. In Section 4, the concept of reducts is introduced into descriptors in incomplete multiscale decision system for the deriving of simplified decision rules. We then conclude the paper with a summary and outlook for further research in Section 5.

Rough Set.
Formally, an information system [3] can be considered as a pair = ( , AT), in which (i) is a nonempty finite set of objects; it is called the universe; (ii) AT is a nonempty finite set of attributes, such that, ∀ ∈ AT, is the domain of attribute .
∀ ∈ , let us denote by ( ) the value that holds on ( ∈ AT). For an information system , one then can describe the relationship between objects through their attributes values. With respect to a subset of attributes such that ⊆ AT, an indiscernibility relation [3]  is referred to as Pawlak's rough set of with respect to the set of attributes . [40] is a tuple = ( , AT), where

Multiscale Information System. A multiscale information system
. . , } is a nonempty, finite set of objects called the universe of discourse; (ii) AT = { 1 , . . . , } is a nonempty, finite set of attributes, and each ∈ is a multiscale attribute; that is, for the same object in , attribute can take on different values at different scales. In multiscale information system, Wu and Leung [41] assumed that all the attributes have the same number of levels of granulations. Therefore, a multiscale information system can be rewritten as a system such that The partition induced by AT is denoted by / AT such that where [ ] AT is the equivalence class that includes at scale .
It should be noticed that, in multiscale information system, since different scales represent different levels of granulations and then there is a hierarchical structure among -scales, such hierarchical structure can be expressed by the inclusion relation among equivalence relations; that is, . . , , with the same decision . Each decomposed decision system represents information on a special level of granulation, that is, scale. In [40], Wu and Leung said that the multiscale decision system is referred to as consistent if and only if the decision system under the first (finest) level of scale, that is, 1 = ( ,{ 1 : = 1, 2, . . . , }) = ( , AT 1 ∪ { }), is consistent; otherwise it is referred to as inconsistent. Following such work, we will propose a more generalized definition of the concept of consistency in multiscale decision system. By Definition 1, we can see that if = 1, 1-scale consistency is same to what have been proposed by Wu and Leung. In such case, we have AT 1 ⊆ { } . Moreover, since AT 1 ⊆ AT 2 ⊆ ⋅ ⋅ ⋅ ⊆ AT , then we do not always have AT 2 ⊆ { } , . . . , AT ⊆ { } ; it follows that, in -scale levels of granulations, we need only at least one of the levels of granulations to satisfy the condition of consistent; then this type of the consistent, that is, 1-scale consistent, can be referred to as the optimistic consistent in multiscale decision system.
On the other hand, let us consider the -scale consistent. In such case, we have AT ⊆ { } . Moreover, since AT 1 ⊆ AT 2 ⊆ ⋅ ⋅ ⋅ ⊆ AT , then we also have AT 2 ⊆ { } , . . . , AT ⊆ { } ; it follows that, in -scale levels of granulations, we need all the levels of granulations to satisfy the condition of consistent; then this type of the consistent, that is, -scale consistent, can be referred to as the pessimistic consistent in multiscale decision system.
From discussions above, it is not difficult to observe that the optimistic and pessimistic consistent are all special cases of -scale consistent in multiscale decision system. The above proposition tells us that if a multiscale decision system is consistent in a given scale, then such multiscale decision system is also consistent in the scale, which is smaller than the given scale.

Remark 3.
It should be noticed that the inverse of Proposition 2 does not always hold; that is, if the multiscale decision system is consistent in a smaller scale, then such decision system is not always consistent in a bigger scale.
The pair [AT ( ), AT ( )] is referred to as the -scale rough set of in multiscale decision system. Since thescale rough set shown in Definition 4 is still based on the equivalence relation, then the properties of Pawlak's rough set still satisfy the -scale rough set. We omit these properties in this paper.
By Definition 4, we can see that, in 1-scale lower approx- The Scientific World Journal multigranulation lower approximation [31,33]. Moreover, in 1-scale upper approximation, we have [ ] it follows that, in levels of granulations, we need all the levels of granulations to satisfy the intersection condition between equivalence class and target concept; such explanation is also compatible with that in Qian et al. 's optimistic multigranulation upper approximation. From this point of view, 1-scale rough set is also referred to as the optimistic multiscale rough set in multiscale decision system.
On the other hand, let us consider -scale lower approx- it follows that, in levels of granulations, we need all the levels of granulations to satisfy the inclusion condition between equivalence class and target concept; such explanation is compatible with that in Qian et al. 's pessimistic multigranulation lower approximation [32]. Moreover, inscale upper approximation, we have [ ] it follows that, in levels of granulations, we need at least one of the levels of granulations to satisfy the intersection condition between equivalence class and target concept; such explanation is also compatible with that in Qian et al. 's pessimistic multigranulation upper approximation. From this point of view, -scale rough set is also referred to as the pessimistic multiscale rough set in multiscale decision system.
The above proposition tells us that, with the monotonous increasing of levels of granulations, the -scale lower approximations become smaller while the -scale upper approximations become bigger. In other words, we can obtain a string of rough sets through different levels of granulations in multiscale decision system.
The accuracy of -scale rough approximation is defined by where | | denotes the cardinal number of set . Obviously, 0 ≤ ( ) ≤ 1 holds.

-Scale Descriptors Based Rough
Set. An incomplete multiscale information system is still denoted by = ( , AT) in this paper. Given an incomplete multiscale information system , if ( ) = * , then we say that the value of object is unknown on the attribute in terms of the -scale. Moreover, we assume that the unknown value * can be compared with any other values in the domain of the corresponding attributes [4,5]. Therefore, we use the descriptor based rough set for analyzing the incomplete multiscale information system.
In the discussion to follow, the symbols ∧ and ∨ denote the logical connectives "and" (conjunction) and "or" (disjunction), respectively [15]. Given an incomplete multiscale information system , if ⊆ AT , then any attribute-value pair ( , V ) is called an -atomic property where ∈ and V ∈ . Any -atomic property or conjunction of different -atomic properties is called the -descriptor. If ( , V ) is the atomic property occurring in -descriptor , we simply say that ( , V ) ∈ . Obviously, is constructed at scale ; it can also be called a -scale descriptor.
Let be an -descriptor; if, for all ( , V ) ∈ , we have ( , V ) ∈ , that is, is constructed from a subset of atomic properties occurring in , then we say is coarser than or is finer than and is denoted by ⪰ or ⪯ . If is constructed from a proper subset of atomic properties occurring in , then we say is properly coarser than and is denoted by ≻ or ≺ .
Let be an -descriptor; the attributes set occurring in is denoted by ( ). Moreover, if is an -descriptor and ( ) = , then is called full -descriptor. Here, suppose that ∧ ∈ ( , V ) is a full -descriptor; we denote then ‖∧ ∈ ( , V )‖ is referred to as the support of Here, let us denote By the descriptor technique, the universe could be partitioned into several subsets that may overlap at scale , and the result is denoted by / such that The Scientific World Journal 5 Table 1: An example of incomplete multiscale decision system. 1 1 In complete multiscale decision system, the hierarchical structure is represented by a partial relation among different equivalence relations or among different partitions. In incomplete multiscale decision system, since, for each level of granulation, we can obtain a family of the supports of the descriptors, which form coverings on the universe of discourse, and then we can use those supports of the descriptors to represent the hierarchical structure such that where, ∀ , ∈ {1, 2, . . . , } and ≤ , /AT ⊑ /AT means that the following two conditions hold: (1) ∀‖ ‖ ∈ /AT , there must be ‖ ‖ ∈ /AT such that ‖ ‖ ⊆ ‖ ‖; (2) ∀‖ ‖ ∈ /AT , there must be ‖ ‖ ∈ /AT such that ‖ ‖ ⊆ ‖ ‖.

Remark 7.
It should be noticed that if /AT and /AT are all partitions, then condition (1) implies condition (2) or condition (2) implies condition (1); it follows that only one of the above conditions is needed. However, since, in incomplete multiscale information system, /AT and /AT may be the coverings instead of the partitions, then the above two conditions are needed simultaneously.
The above two conditions for expressing the hierarchical structure in incomplete multiscale information system are consistent with the basic thinking of surjective function. In other words, in an incomplete multiscale information system ≤ , a surjective function can be defined as Such surjective function transforms the granulation spaces from a smaller scale to a bigger scale in the incomplete multiscale information system.
Example 8. Table 1 shows an example of incomplete multi- } is the set of the condition attributes, and is the decision attribute. The system has three levels of granulations, where " , " " , " " , " " , " " , " " , " " , " and " " stand for, respectively, "good, " "fair, " "bad, " "low, " "medium, " "high, " "yes, " and "no. " By the descriptor technique we mentioned above, it is not difficult to obtain the descriptors and their supports in each level of granulation. The results of full AT 1 , AT 2 , and AT 3 descriptors are shown in Tables 2, 3, and 4, respectively. Similar to the complete case, the 1-scale consistent incomplete multiscale decision system is referred to as the 6 The Scientific World Journal  Remark 11. It should be noticed that the inverse of Proposition 10 does not always hold; that is, if the incomplete multiscale decision system is consistent in a smaller scale, then such incomplete decision system is not always consistent in a bigger scale.
Example 13. Take, for instance, Table 1; since the decision attribute partitions the universe into three disjoint subsets such that (1) 1-scale lower and upper approximations:

-scale lower and upper approximations:
By the above computations, we can see that Table 1 is inconsistent at each level of granulation, that is, each scale.
The results in Proposition 14 are consistent with those in Proposition 5; that is, with the variety of levels of granulations in incomplete multiscale decision system, the lower approximations, upper approximations, and boundary regions are monotonic.

-Scale Decision
Rules. The end result of rough set is a representation of the information contained in the data system considered in terms of "if. . . then. . ." decision rules [43,44]. Since an incomplete multiscale decision system contains a family of the systems with different levels of granulations, then, given an incomplete multiscale decision system, one can derive decision rules at each scale. For example, suppose that = ( , AT ∪ { }) = ( , { : = 1, 2, . . . , , = 1, 2, . . . , } ∪ { }) be an incomplete multiscale decision system; then a -scale decision rule is represented by where ∈ FDES(AT ), = ( , ), ∈ , and and are, respectively, called the condition and decision parts of the rule .
For each -scale decision rule : → , we associate a quantitative measure, called the certainty, of and it is defined by (1) since /AT 1 ⊑ /AT 2 ⊑ ⋅ ⋅ ⋅ ⊑ /AT , then, ∀‖ ‖ ∈ /AT , there must be ‖ ‖ ∈ /AT such that ‖ ‖ ⊆ ‖ ‖; it tells us that if we have a certain -scale decision rule such that : → , then we can also obtain a certain -scale decision rule; that is, : → ; (2) since /AT 1 ⊑ /AT 2 ⊑ ⋅ ⋅ ⋅ ⊑ /AT , then, ∀‖ ‖ ∈ /AT , there must be ‖ ‖ ∈ /AT such that ‖ ‖ ⊆ ‖ ‖; it tells us that if we have a possible 8 The Scientific World Journal -scale decision rule such that : → , then we can also obtain a possible -scale decision rule; that is, : → .
Example 15. Following the results of approximations we obtained in Example 13, it is not difficult to derive the following decision rules at 3 different levels of granulations in Table 1: (1) 1-scale decision rules: (a) 1-scale certain decision rules:  (1) ‖ ‖ = ‖ ‖;
By Definition 16, we can see that a reduct descriptor of is a conjunction of the atomic properties in , which preserves the support of . The reduct descriptor allows us to classify objects with the smallest number of required atomic properties.

Lower Approximation and Boundary Region Reduct
Descriptors. In Section 3.2, we have mentioned that the certain decision rules can be generated from the descriptors, which in the lower approximation, the possible decision rules can be generated from the descriptors, which are in the boundary region. Therefore, to obtain the simplified decision rules, the concept of reduct can also be introduced into the lower approximation and boundary region.
Then is referred to as the lower approximation reduct descriptor of if and only if the following two conditions hold: (1) = ; (2) ̸ = for each ≻ ; is referred to as the boundary region reduct descriptor of if and only if the following two conditions hold: (1) = ; (2) ̸ = for each ≻ .
By Definition 20, we can see that a lower approximation reduct descriptor of is a minimal conjunction of the atomic properties in , which preserves the inclusion relation between support of and the decision classes; a boundary region reduct descriptor of is a minimal conjunction of the atomic properties in , which preserves the intersection relation between support of and the decision classes.
are referred to as the lower approximation and boundary region discernibility matrixes, respectively. Proof. We only prove (1); the proof of (2) is similar to the proof of (1).
is referred to as the lower approximation discernibility function; is referred to as the boundary region discernibility function.   Table 1, if the support of a descriptor is in the lower approximation of a decision class, then we can compute the lower approximation reduct descriptor to derive the simplified certain decision rule; similarly, if the support of a descriptor is in the boundary region of a decision class, then we can compute the boundary region reduct descriptor to derive the simplified possible decision rule. By the discernibility matrixes we mentioned above, it is not difficult to obtain the lower approximation and boundary region reduct descriptors as Tables 5 and 6 show, respectively.
By the lower approximation reduct descriptors shown in Table 5, we can derive the following certain decision rules: (1) 1-scale certain decision rules: By the boundary region reduct descriptors shown in Table 6, we can derive the following possible decision rules: (1) 1-scale possible decision rules: