On the depth of decision trees over infinite 1-homogeneous binary information systems

In this paper, we study decision trees, which solve problems defined over a specific subclass of infinite information systems, namely: 1-homogeneous binary information systems. It is proved that the minimum depth of a decision tree (defined as a function on the number of attributes in a problem’s description) grows – in the worst case – logarithmically or linearly for each information system in this class. We consider a number of examples of infinite 1-homogeneous binary information systems, including one closely related to the decision trees constructed by the CART algorithm

Most of the results related to the decision trees were obtained for decision trees over finite set of attributes, in fact for decision tables. However, decision trees over infinite sets of attributes, in particular, linear [10,15,17], quasilinear [17], algebraic decision trees [12,23], and related to them algebraic computation trees [5,11] have been intensively studied as algorithms in combinatorial optimization and computational geometry. In particular, for the traveling salesman problem with 4 n ≥ cities, there exists a linear decision tree with the depth at most 7 / 2 n that solves this problem [15].
There are two approaches to the study of decision trees over infinite sets of attributes: the local approach, where decision trees use only attributes from the problem description, and the global approach, where decision trees use arbitrary attributes from the considered infinite set of attributes [3,17,18]. Our results considered in this paper are obtained in the global approach framework, a more computationally demanding task than the local approach. However, it can often construct better decision trees.
In our previous research on the global approach to decision trees [16], we investigated arbitrary infinite sets of k-valued attributes construed as information systems ( ) where A is an infinite set of objects (inputs) and F is an infinite set of attributes each of which is a mapping from A to the set { } 0,1, , 1 k … − , 2 k ≥ . The notion of a problem over U is defined as follows. Attributes 1 , , n f f … from F divide the set A into domains in which values of these attributes are constants.
Domains are labeled with decisions. For an arbitrary object a A ∈ , it is required to recognize the decision attached to the domain containing a. To solve this problem, decision trees with attributes from F are used. For an arbitrary infinite information system, in the worst case, the minimum depth of a decision tree (as a function on the number of attributes in a problem's description) either is bounded from below by a logarithm and from above by a logarithm to the power 1 ε + , where ε is an arbitrary positive constant or grows linearly.
The additional ε does not guarantee that the number of nodes in the considered decision tree is polynomial in the number of attributes in the problem's description. It is interesting to describe classes of infinite information systems without additional constant ε in upper bounds on the minimum depth of decision trees. A well-known example is the class of information systems ( ) where R is the set of real numbers, N is the set of natural numbers, and d L is the set of 3-valued attributes corresponding to hyperplanes in d R [10]. Other examples can be found in [17].
In this paper, we consider one more such class: infinite 1-homogeneous binary information systems ( ) The word "binary" means that We prove that, for each infinite 1-homogeneous binary information system, in the worst case, the minimum depth of a decision tree (as a function on the number of attributes in a problem's description) grows either logarithmically or linearly.
We define a partial order ≺ on the set of attributes F of an infinite 1-homogeneous binary information systems ( ) Let the cardinality of antichains in F be not bounded from above. Then we prove that, in the worst case, the minimum depth of decision trees grows linearly in the number of attributes in the problem's descriptions.
Let the cardinality of antichains in F be bounded from above. In that case, by Dilworth's theorem [9,13], the set F can be partitioned into a finite number of chains. In each chain, any two attributes are comparable. To find values of a finite number of attributes from a chain, we can use an analog of the binary search algorithm. We prove that, in the worst case, the minimum depth of decision trees grows logarithmically in the number of attributes in the problem's descriptions.
We consider examples of infinite 1-homogeneous binary information system in Section 3. One of the most interesting is the information system and b ∈ R . Attributes from this system are used by decision trees constructed by CART [7] for decision tables without categorical attributes and with n continuous attributes.
During the work of a decision tree, we calculate values of some attributes and obtain equations of the form "attribute = value". Then we conclude the decision based on the obtained equations and the equations that are consequences of them. The main novelty of this paper is the consideration of the mechanism of consequence inference. We can study information systems with special inference mechanisms as in this paper (see also another mechanism described in the conclusions). We can restrict the inference mechanism and consider only the consequences derived from one equation or at most two equations, etc. The present paper initiates a new direction of research related to decision trees and based on the study of consequence inference mechanisms.
The rest of the paper is organized as follows. In Section 2, we discuss the main notions. In Section 3, we consider examples of infinite 1-homogeneous binary information systems. Section 4 is devoted to the study of the depth of decision trees over infinite 1-homogeneous binary information systems. Section 5 contains short conclusions.

Main notions
In this section, we define the main notions: infinite 1-homogeneous binary information system, problem over such system, decision tree solving this problem, Shannon function related to the information system, and the width of the information system. For a given a A ∈ , the tree Γ works in the following way. We start at the root. If the considered node is a terminal node, then the number from N attached to this node is the result of Γ work. Let the considered node be labeled with an attribute f F ∈ . Then, we compute the value ( ) f a and pass along the edge that leaves the considered node and is labeled with the number ( ) f a .

Definition 4.
We will say that the decision tree Γ solves the problem z if, for any a A ∈ , the result of Γ work is equal to ( ) z a . We denote by ( ) Definition 7. We will say that a subset G of the set F is independent if there are no different attributes 1 2 , Definition 8. We will say that the set F has infinite width if, for any natural m, there exists a subset of the set F, which cardinality is equal to m and which is an independent set. Otherwise, the width of F is finite and is equal to the maximum cardinality of a subset of F, which is an independent set. The width of the information system ( ) is the width of the set F.

Examples of infinite 1-homogeneous binary information systems
In this section, we consider four examples of infinite 1-homogeneous binary information systems. In the first three, the attributes have clear geometric interpretation, which clarifies the examples.  Fig. 1). It is easy to show that 0 U is an infinite 1-homogeneous binary information system. The width of this system is equal to one.   Figure 2 illustrates the information system 2 V . One can show that n V is an infinite 1-homogeneous binary information system. The width of this system is equal to n.  Fig. 3). The attribute corresponding to a circle c from the right system has value 0 outside the circle c (see Fig. 3). We denote by Q the set of attributes corresponding to all circles from both the left and right systems.
. One can show that W is an infinite 1-homogeneous binary information system.
The width of this system is equal to one.
One can show that V N is an infinite 1homogeneous binary information system. This system has infinite width.

Behavior of Shannon function
The following theorem gives us criteria of the linear and logarithmic growth of the Shannon function depending on the width of the considered infinite 1-homogeneous binary information system. Therefore, the maximum cardinality of an antichain is equal to m. According to Dilworth's theorem [9,13], the set F can be partitioned into m chains 1 , , m C C … . In each chain, any two attributes are comparable.
Let n be a natural number and