The asymptotic distribution of the isotonic regression estimator over a general countable pre-ordered set

We study the isotonic regression estimator over a general countable pre-ordered set. We obtain the limiting distribution of the estimator and study its properties. It is proved that, under some general assumptions, the limiting distribution of the isotonized estimator is given by the concatenation of the separate isotonic regressions of the certain subvectors of an unrestrecred estimator's asymptotic distribution. Also, we show that the isotonization preserves the rate of convergence of the underlying estimator. We apply these results to the problems of estimation of a bimonotone regression function and estimation of a bimonotone probability mass function.


Introduction
Let X be a countable set {x 1 , x 2 , . . . } with |X | ≤ ∞, with a pre-order ≪ defined on it. We begin with the definitions of the order relations on an arbitrary set X and of an isotonic regression over it, c.f. [4,16,17].

Definition 1 A binary relation ≪ on X is a simple order if
(i) it is reflexive, i.e. x ≪ x for x ∈ X ; (ii) it is transitive, i.e. x 1 , x 2 , x 3 ∈ X , x 1 ≪ x 2 and x 2 ≪ x 3 imply x 1 ≪ x 3 ; (iii) it is antisymmetric, i.e. x 1 , x 2 ∈ X , x 1 ≪ x 2 and x 2 ≪ x 1 imply (iv) every two elements of X are comparable, i.e. x 1 , x 2 ∈ X implies that either x 1 ≪ x 2 or x 2 ≪ x 1 .
A binary relation ≪ on X is a partial order if it is reflexive, transitive and antisymmetric, but there may be noncomparable elements. A pre-order is reflexive and transitive but not necessary antisymmetric and the set X can have noncomparable elements. Note, that in some literature the pre-order is called as a quasi-order.
Let us introduce the notation x 1 ∼ x 2 , if x 1 and x 2 are comparable, i.e. if x 1 ≪ x 2 or x 2 ≪ x 1 .
Let F is = F is (X ) denote the family of real valued bounded functions f on a set X , which are isotonic with respect to the pre-order ≪ on X . In the case when |X | = ∞ we consider the functions from the space l w 2 , the Hilbert space of real-valued functions on X , which are square summable with some given non-negative weights w = {w 1 , w 2 , . . . }, i.e. any g ∈ l w 2 satisfies ∞ i=1 g(x i ) 2 w i < ∞. We use the same notation F is to denote the functions from l w 2 which are isotonic with respect to the pre-order ≪.
Definition 3 A function g * : X → R is the isotonic regression of a function g : X → R over the pre-ordered set X with weights w ∈ R s + , with s ≤ ∞, if where w x i = w i , for i = 1, . . . , s.
Conditions for existence and uniqueness of g * will be stated below. Similarly one can define an isotonic vector in R s , with s ≤ ∞, and the isotonic regression of an arbitrary vector in R s . Let us consider a set of indices I = {1, . . . , s}, with s ≤ ∞, with some pre-order ≪ defined on it.
We denote the set of isotonic vectors in R s , with s ≤ ∞, by F is = F is (I).
In the case of an infinite index set we consider the square summable vectors (with weights w) from l w 2 , the Hilbert space of all square summable vectors with weights w.
Definition 5 A vector θ * ∈ R s , with s ≤ ∞, is the isotonic regression of an arbitrary vector θ ∈ R s (or θ ∈ l w 2 , if s = ∞) over the pre-ordered index set I with weights w ∈ R s + if Given a set X with a pre-order ≪ on it one can generate a pre-order on the set I = {1, 2 . . . } of indices of the domain in X as follows. For i 1 , i 2 ∈ I, i 1 ≪ i 2 if and only if x i 1 ≪ x i 2 . This pre-order on the index set I will be called the pre-order induced by the set X and will be denoted by the same symbol ≪. Conversely, if one starts with the set I consisting of the indices of the elements in X , and ≪ is a pre-order on I, the above correspondence defines a pre-order on X . Therefore, in the sequel of the paper a bold symbol, e.g. g, will denote a vector in R s , with s ≤ ∞, whose i-th component is given by g i = g(x i ), for i = 1, . . . , s, where g(x) is a bounded real valued function on X . In this case we will say that the vector g corresponds to the function g(x) on X and vice versa.
Corollary 1 A real valued function f (x) on the countable set X with the pre-order ≪, defined on it, is isotonic if and only if its corresponding vector f ∈ R s , with s ≤ ∞, is an isotonic vector with respect to the corresponding pre-order ≪ on its index set I = {1, 2, . . . }, induced by the pre-order on X . A real valued function g * (x) on the set X is the isotonic regression of a function g(x) with weights w if and only if its corresponding vector g * ∈ R s is the isotonic regression of the vector g ∈ R s with respect to the corresponding pre-order ≪ on its index set I = {1, 2, . . . } with weights w.
To state the inference problem treated in this paper, suppose that X is a finite or an infinite countable pre-ordered set andg ∈ F is is a fixed unknown function. Suppose we are given observations z i , i = 1, . . . , n, independent or not, that depend on the (parameter)g in some way. In the sequel we will treat in detail two important cases: The data z 1 , . . . , z n are observations of either of (i) Z i , i = 1, . . . , n independent identically distributed random variables taking values in X , with probability mass functiong.
. . , n, with x i deterministic (design) points in X and Y i real valued random variables defined in the regression model where ε i is a sequence of identically distributed random variables with

Now assume thatĝ
is a R s -valued statistic. We will call the sequence {ĝ n } n≥1 the basic estimator ofg. In order to discuss consistency and asymptotic distribution result we introduce the following basic topologies: When s < ∞, we study the Hilbert space with the inner product < g 1 , g 2 >= s i=1 g 1,i g 2,i w i , for g 1 , g 2 ∈ R s , endowed with its Borel σ-algebra B = B(R s ) and when s = ∞ we study the space l w 2 with the inner product < g 1 , and we equip l w 2 with its Borel σ-algebra B = B(l w 2 ). Now define the isotonized estimatorĝ * n bŷ The main goal of this paper is to study the asymptotic behaviour ofĝ * n , as n → ∞.
We make the following assumptions on the basic estimatorĝ n , for the finite, s < ∞, and the infinite, s = ∞, support case, respectively. Assumption 1 Suppose that s < ∞. Assume thatĝ n p →g for someg ∈ F is and B n (ĝ n −g) d → λ, where λ is a random vector in (R s , B) and B n is a diagonal s × s matrix with elements [B n ] ii = n q i with q i being real positive numbers.
Assumption 2 Suppose that s = ∞. Letĝ n , for n = 1, 2, 3, . . . , be a tight sequence of random vectors taking values in the Hilbert space l w 2 . Assume thatĝ n p →g for someg ∈ F is , and B n (ĝ n −g) d → λ, where λ is a random vector in (l w 2 , B) and B n is a linear operator l w 2 → l w 2 , such that for any g ∈ l w 2 it holds that (B n g) i = n q i g i , with q i being the real positive numbers. Suppose also that any finite s-dimensional cylinder set in l w 2 is a continuity set for the law of λ.
Note that the matrix B n in Assumption 1 and the operator B n in Assumption 2 allow for different rates of convergence for different components ofĝ n and the values of q i will be specified later.
For a general introduction to the subject of constrained inference we refer to the monographs: Barlow R. E. et al. [4], Robertson T. et al. [16], Silvapulle M. J. [17] and Groeneboom P et al. [12]. In these monographs the problem of an isotonic regression has been considered in different settings, and in particular basic questions such as existence and uniqueness of the estimators have been addressed. In Lemmas 1 and 7 below we list those properties which will be used in the proofs.
The asymptotic behaviour of the regression estimates over a continuous setup under monotonic restriction was first studied in [9,19] where it was shown that the difference of the regression function and its estimate multiplied by n 1/3 , at a point with a positive slope, has a nondegenerate limiting distribution. In [1] the authors studied a general asymptotic scheme for an order constrained inference. The problem of a probability density estimation was studied, for example, in [10,11,15]. In the discrete case some recent results are [5,6,13] This work is mainly motivated by the results obtained in [6,13]. In [13] the problem of estimation of a discrete monotone distribution was studied in detail. It was shown that the limiting distribution of the constrained maximum likelihood estimator (mle) of a probability mass function (pmf) is a concatenation of the isotonic regressions of Gaussian vectors over a periods of constancy of a true pmf p, c.f. Theorem 3.8 in [13]. In the derivation of the limiting distribution in [13] the authors used the strong consistency of the empirical estimator of p as well as the fact that the constrained MLE in the case of decreasing constraints is given by the the least concave majorant (lcm) of the empirical cumulative distribution function (ecdf).
The problem of mle of a unimodal pmf was studied in [6]. That problem is different from the one being considered here, since [6] treats only pmf on Z, whereas we are able to treat multivariate problems with our approach. Also the article [6] contains substantial references to the applications where one deals with discrete or discretized data.
Recall, that in our work we do not require strong consistency of a basic estimatorĝ n and we consider a general pre-order constraints where the expression for an isotonic regression is more complicated than lcm of the ecdf, c.f Assumptions 1 and 2. Also it turns out that the limiting distribution of the isotonised estimatorĝ * n can be split deeper than to, the analogue of the periods of constancy ofg in the univariate case, the level sets ofg.
The remainder of this paper is organised as follows. In Section 2 we consider the finite dimensional case, i.e. s < ∞. Theorem 1 gives the asymptotic distribution of the isotonised estimator. Next, in Section 3 we consider the infinite case, which is quite different from the finite one. Theorem 3 describes the asymptotic behaviour of the isotonised estimator for the infinite dimensional case. In Section 4 we discuss the application of the obtained results to the problems of estimation of a bimonotone regression function and of a bimonotone probability mass function, respectively.

Case of finitely supported functions
Let us assume that s < ∞, i.e. that the basic estimator {ĝ n } n≥1 is a sequence of finite-dimensional vectors. The next lemma states some well-known general properties of the isotonic regression of a finitely supported function.
(iii)ĝ * n , viewed as a mapping from R s into R s , is continuous. Moreover, it is also continuous if it is viewed as a function on the 2s-tuples of real numbers (w 1 , w 2 , . . . , w s , g 1 , g 2 , . . . , g s ), with w i > 0.
Proof. The statements (i), (ii), (iii) and (iv) are from [16]  Note that statement (ii) means that if the basic estimatorĝ n satisfies a linear restriction, e.g. s i=1 w iĝn,i = c, with some positive reals w i , then the same holds for its isotonic regression with the weights w, i.e. forĝ * n one has We make a partition of the original set into comparable sets X (1) , . . . , X (k) where each partition set X (v) contains elements such that if x ∈ X (v) , then x is comparable with at least one different element in X (v) (if there are any), but not with any other element in X (µ) for any µ = v. In fact, the partition can be constructed even for an infinite set X = {x 1 , x 2 , . . .} in the following way. We first construct the set X (1) containing x 1 , iteratively, as follows: Note that x i ∼ x j for some x j ∈ X (1) 0 in the above statement, means exactly, x j ∼ x 1 . Next, in the second iteration, we append to X (1) 1 points that are comparable to some x j ∈ X (1) We iterate the construction to obtain X 3 , X 4 , . . . , in the obvious way. This construction is countable, and in the final step gives us a (finite or infinite) set which we call X (1) .
Then either X (1) = X , in which case we call the set X non-decomposable, and we are done with the partition construction, with k = 1. Or else, there is a point x l ∈ X \ X (1) , and we can construct the next set X (2) in the partition, iteratively, as follows: Define X (2) 0 = {x l }, and append to this all points outside of X (1) that are comparable to x l , i.e.
The next iteration in the construction of X (2) is (1) )\X (2) 1 ,x i ∼x j some x j ∈X (2) This procedure can be iterated, producing sets X 3 , . . . , and is a countable iterative construction which will terminate with a final set, which we call X (2) .
Then either X = X (1) ∪ X (2) , and we are done with the partition construction, with k = 2. Or else there is a point x u ∈ X \ X (1) ∪ X (2) , and we can continue the construction of the set X (3) containing x u .
The above procedure can be iterated in an obvious way, will exhaust the countable set X , and therefore is a countable construction of the desired partition (3). Note, that it is possible for a partitioning set X (v) to have only one element. Also, the partition of X in (3) is unique and Now assume that X is finite, that we are given the partition (3) and let , which are isotonic with respect to the pre-order, will be denoted by The next lemma states a natural result of an isotonic regression on X , that it can be obtained as a concatenation of the individual isotonic regressions the comparable sets of the restrictions to the comparable sets.
Lemma 2 Let g(x) be an arbitrary real valued function on the finite set X with a pre-order ≪ defined on it, and assume that the partition (3) is given. Then the isotonic regression of g(x) with any positive weights w with respect to the pre-order ≪ is equal to where g * (v) (x) is the isotonic regression of the function g (v) (x) over the set X (v) with respect to the pre-order ≪.
Proof. Let g(x) be an arbitrary real-valued function defined on X . From the definition of the isotonic regression where f (v) is the restriction of the function f : X → R to the set X (v) . The second equality follows from (3) and the last equality follows from the fact that since the elements from the different partition sets X (v) are noncomparable, then any function f ∈ F is can be written as a concatenation of Now letg(x) be the fixed function defined in Assumption 1, assume that we are given the partition (3) of X and for an arbitrary but fixed The partition is constructed in the following way: We note first that the N v values in the vectorg v are not necessarily all unique, so there arem v ≤ N v unique values ing v . Then in a first step we constructm v level sets Next we note that for any non-singleton level setX (v,l) there might be non-comparable points, i.e. x i , x j ∈X (v,l) can be such that neither x i ≪ x j nor x j ≪ x i hold. Therefore, in the second step for each fixed l we can partition (if necessary) the level setX (v,l) into sets with comparable elements, analogously to the construction of (3). We can do this for every v and end up in a partition (5) In the partition (5) each set X (v,l) is characterised by We have therefore proved the following lemma.
Lemma 3 For any countable set X with the pre-order ≪ and any isotonic functiong(x), defined on it, there exists a unique partition X = ∪ k v=1 ∪ mv l=1 X (v,l) , satisfying the statements (i) and (ii) above. For the index set I with the pre-order ≪ generated by the set X and any isotonic functiong(x), defined on X , there exists a unique partition I = ∪ k v=1 ∪ mv l=1 I (v,l) , satisfying conditions analogous to (i) and (ii) stated above.

Definition 6
The set X will be called decomposable if in the partition, defined in (3), k > 1. In the partition (5) the sets X (v,l) will be called the comparable level sets ofg(x). In the corresponding partition of the index set I the sets I (v,l) will be called the comparable level index sets ofg.
Recall that g (v,l) (x) is the restriction of the function g(x) to the comparable level set X (v,l) , for l = 1, . . . , m v and v = 1, . . . , k.
In the case of a non-decomposable set, the full partition will be written as Next, suppose that X is a non-decomposable set, and let us consider an arbitrary functiong(x) ∈ F is . Assume that forg(x) there has been made a partition X = ∪ m l=1 X (l) in (5), satisfying (i) and (ii). Define the smallest comparable level distance ofg as for l, l ′ = 1, . . . , m, l = l ′ , provided that there exist at least one x 1 ∈ X (l) and at least one x 2 ∈ X (l ′ ) , such that x 1 and x 2 are comparable. Note, that ε is always finite and for the finite support case, s < ∞, alsoε > 0.
Lemma 4 Consider an arbitrary real valued function g(x) on a nondecomposable finite set X with the pre-order ≪ and letε be defined in (6). If then the isotonic regression of g(x) is given by where g * (l) (x) is the isotonic regression of the function g (l) (x) over the set X (l) with respect to the pre-order ≪. Therefore, the function g * (x) is a concatenation of the isotonic regressions of the restrictions of g(x) to the comparable level sets ofg(x).
Proof. First, note that if the condition of the lemma is satisfied, then the function g * (x) defined in (8) on the set X is isotonic. This follows from Lemma 1, statement (iv). Second, assume that the function g * (x) defined in (8) is not an isotonic regression of g(x). This means that there exists another functiong(x), such that Using the partition of X , (9) can be rewritten as Therefore, for some l ′ we must have with g (l ′ ) (x),g (l ′ ) (x) and g * (l ′ ) (x) the restrictions to the comparable level set X (l ′ ) of g(x),g(x) and g * (x), respectively. Since the function g * (l ′ ) (x) is the isotonic regression of the function g (l ′ ) (x) on the set X (l ′ ) , the last inequality contradicts the property of the uniqueness and existence of the isotonic re- The next lemma is an auxiliary result which will be used later in the proof of the asymptotic distribution ofĝ * n .
Lemma 5 Assume X n and Y n are sequences of random vectors, taking values in the space R s , for s ≤ ∞, with some metric on it, endowed with its Borel σ-algebra. If X n d → X and lim n→∞ P(X n = Y n ) = 1, then Y n d → X.
Proof. This result was proved in [2]. ✷ Let us consider the sequence B n (ĝ * n −g), whereĝ * n is the isotonic regression ofĝ n , which was defined in Assumption 1, and with a specified matrix B n . As mentioned in Assumption 1, we allow different rates of convergence n q i for different components ofĝ n . We however require q i , for i = 1, . . . , s, to be equal on the comparable level index sets I (v,l) ofg, i.e. q i , for i = 1, . . . , s, are real positive numbers such that q i 1 = q i 2 , whenever i 1 , i 2 ∈ I (v,l) .
We introduce an operator ϕ : R s → R s defined in the following way. First, for any vector θ ∈ R s we define the coordinate evaluation map θ(x) : X → R, corresponding to the vector θ, by θ(x i ) = θ i , for i = 1, . . . , s.
Then, let θ * (v ′ ,l ′ ) (x) be the isotonic regression of the restriction of θ(x) to the comparable level set X (v ′ ,l ′ ) ofg(x), and define The asymptotic distribution of B n (ĝ * n −g) is given in the following theorem.

Theorem 1 Suppose that Assumption 1 holds. Then
where ϕ is the operator, defined in (10).

Proof.
First, from Lemma 3 we have that any pre-ordered set X can be uniquely partitioned as and with the partition (12) of X (v) determined by the isotonic vectorg. Second, as shown in Lemma 2, the isotonic regression of g(x) on the original set X can be obtained as a concatenation of the separate isotonic regressions of the restrictions of g(x) to the non-decomposable sets in the partition (3). Therefore, without loss of generality, we can assume that the original set X is non-decomposable. Thus, any x ∈ X is comparable with at least one different element of X , k = 1, and andg 1,l ≡g l . Note, that we have dropped the index v. Third, sinceĝ n is consistent, by Assumption 1, for any ε > 0, as n → ∞. Note that the comparable level distanceε ofg, defined in (6), satisfiesε > 0, and take ε =ε/2. Then from Lemma 4 we obtain Therefore, (13) and (14) imply as n → ∞. Next, since the isotonic regression is a continuous map (statement (iii) of Lemma 1), the operator ϕ is a continuous map from R s to R s . Therefore, using the continuous mapping theorem [18], we get Furthermore, using statement (vi) of Lemma 1 and taking into account the definition of the matrix B n , we get ϕ(B n (ĝ n −g)) = B n (ϕ(ĝ n ) −g).
Then (15), (16) and (17) imply that as n → ∞. Finally, using Lemma 5, from (16) and (18) we prove that as n → ∞. ✷ For a given pre-order ≪ on X there exists a matrix A such that Ag ≥ 0 is equivalent to g is isotonic with respect to ≪, c.f. Proposition 2.3.1 in [17]. Therefore, if there are no linear constraints imposed on the basic estimator g n , Theorem 1 can also be established by using the results on estimation when a parameter is on a boundary, in Section 6 in [3].
Assume that each vectorĝ n has the following linear constraint s i=1ĝ n,i w i = c (for example, in the case of estimation of a probability mass function it would be s i=1ĝ n,i = 1). Then, the expression for a limiting distribution in Theorem 1 does not follow directly from the results in [3] in the case whenĝ n is linearly constrained. However, the result of Theorem 1 holds, because, as it established in statement (ii) of Lemma 1, isotonic regression with weights w preserves the corresponding linear constraint.
Next we consider the case when the vector of weights w is not a constant, i.e. we assume that some non-random sequence {w n } n≥1 , where each vector w n satisfies the condition (1), converges to some non-random vector w, which also satisfies (1). We denote by θ * w (x) the isotonic regression of θ(x) with weights w and analogously to (10) we introduce the notation ϕ w (θ) where θ * w(v ′ ,l ′ ) (x) is the isotonic regression, with weights w, of the restriction of θ(x) to the comparable level set X (v ′ ,l ′ ) ofg(x), where the indices v ′ and l ′ are such that x i ∈ X (v ′ ,l ′ ) . Define the isotonic regressionĝ * wn n of the basic estimatorĝ n . The next theorem gives the limiting distribution ofĝ * wn n . Theorem 2 Suppose that Assumption 1 holds. Then the asymptotic distribution of the isotonic regressionĝ * wn n of the basic estimatorĝ n is given by where ϕ w is the operator, defined in (19).
Proof. Without loss of generality, we can assume that the original set X is non-decomposable First, since the sequenceĝ n is consistent, then for any as n → ∞, withε taken from Lemma 4. Using the statement of Lemma 4, we obtain Note that the result of Lemma 4 holds for any weights w n . Therefore, from (21) and (22) we have as n → ∞. Second, from statement (iii) of Lemma 1, the operators ϕ wn , ϕ w are continuous maps from R 2s to R s , for all weights w n , w satisfying (1). Using the (esxtended) continuous mapping theorem [18], we get where w is the limit of the sequence {w n } n≥1 .
Third, using statement (vi) of Lemma 1 and the definition of the matrix B n we obtain ϕ wn (B n (ĝ n −g)) = B n (ϕ wn (ĝ n ) −g).

Case of infinitely supported functions
In this section we assume that the original set X = {x 1 , x 2 . . . } is an infinite countable enumerated set with a pre-order ≪ defined on it.
In the case of the infinitely supported functions the isotonic regression's properties are similar to the ones in the finite case, but the proofs are usually different. For completeness we state these properties in the following lemma.
Proof. Statements (i), (ii) and (iii) follow from Theorem 8.2.1, Corollary B of Theorem 8.2.7 and Theorem 8.2.5, respectively, in [16], statements (iv), (v) and (vi) follow from Corollary B of Theorem 7.9, Theorems 2.2 and Theorems 7.5 and 7.8, respectively, in [4]. ✷ We partition the original set X in the same way as it was done in the finite case, i.e., first, let where k ≤ ∞ is the number of sets and each set X (v) is such that if x ∈ X (v) , then x is comparable with at least one different element in X (v) (if there are any), but not with any other elements which belong to other terms in the partition. Note, that X (v) can have only one element. The partition of X is unique and Furthermore, for the fixed functiong ∈ F is , defined in Assumption 2 (or, equivalently, its corresponding isotonic vectorg ∈ F is ), we partition each set X (v) in (27) into the comparable level sets ofg, i.e.
in the same way as it was done in the finite case in (5). Note, that sinceg ∈ l w 2 and condition (1) is satisfied, the cardinality of any set X (v,l) is less than infinity wheneverg v,l = 0, otherwise we would have The set X (v,l) can have infinitely many elements only ifg v,l = 0.
For the partition in (27) we obtain a result similar to the one obtained in Lemma 2 for the finite case.

Lemma 7 Let g(x) be an arbitrary real valued function in l w
2 on the set X with a pre-order ≪ defined on it. Then the isotonic regression of g(x) with any positive weights w is equal to where g * (v) (x) is the isotonic regression of the restriction of the function g(x) to the set X (v) over this set with respect to the pre-order ≪.
Proof. The proof is exactly the same as in the finite case (Lemma 2). ✷ As a consequence of Lemma 7, without loss of generality in the sequel of the paper we can assume that the original set X is non-decomposable and use the same notation as in the finite case, i.e. X = ∪ m l=1 X (l) ≡ ∪ m 1 l=1 X (1,l) and, respectively, g (l) (x) ≡ g (1,l) (x) for the restriction of the function g(x) to the set X (l) .
In the case of an infinite support the result of Lemma 4 is generally not applicable, because the value ofε can be zero. We therefore make the following slight modification of Lemma 4. Thus, assume that for a function g(x) ∈ F is we have made a partition X = ∪ m l=1 X (l) with m ≤ ∞. Furthermore, for any finite positive integer number m ′ < m ≤ ∞ we choose m ′ comparable level sets X (l j ) , such that the values of the functiong(x) on them satisfy |g l 1 | ≥ |g l 2 | ≥ · · · ≥ |g l m ′ |. Next, we rewrite the partition as where X (l m ′ +1 ) = X \ X (l 1 ) ∪ X (l 2 ) ∪ · · · ∪ X (l m ′ ) . Definẽ Lemma 8 Consider an arbitrary real valued function g(x) ∈ l w 2 on a nondecomposable infinite countable set X with the pre-order ≪ defined on it. If for someg(x) ∈ F is we have then the isotonic regression of g(x) is given by where g * (l ′′ ) (x) is the isotonic regression of the function g (l ′ ) (x) over the set X (l ′ ) with respect to the pre-order ≪. Therefore, the function g * (x) is a concatenation of the isotonic regressions of the restrictions of g(x) to the sets X (l 1 ) , X (l 2 ) , . . . , X (l m ′ ) and X (l m ′ +1 ) .
Proof. The proof is exactly the same as in the case of a finite support (Lemma 4). ✷ Next we state and prove an auxiliary lemma which will be used in the final theorem.  (1,s) are vectors in R s constructed from the elements of the vectors Z n and Z in the way that j-th elements ofZ (1,s) n andZ (1,s) are equal toĩ j -th elements of the vectors Z n and Z, respectively, withĩ j being the j-th index from the rearranged index setĨ. In addition, assume that any cylinder set in l w 2 is a continuity set for the law ofZ (1,s) . Then Z n d → Z.
Proof. The space l w 2 is separable and complete. Then, from Prokhorov's theorem [21], it follows that the sequence Z n is relatively compact, which means that every sequence from Z n contains a subsequence, which converges weakly to some vector Z. If the limits of the convergent subsequences are the same, then the result of the lemma holds.
Since the space l w 2 is separable, the Borel σ-algebra equals the σ-algebra generated by open balls in l w 2 [8]. Therefore, it is enough to show that the limit laws agree on the open ball, since the finite intersections of the finite balls in l w 2 constitute a π-system. To show this, we note that the open ball in l w 2 can be written as where P is the law of Z. Thus, we have shown that the limit laws, P, of the convergent subsequences of {Z n } agree on the open balls B(z, ε), and, therefore, also on the finite intersections of these open balls. Since the laws agree on the π-system (they are all equal to P), they agree on the Borel σ-algebra. ✷ Finally, the next theorem gives the limiting distribution ofĝ * n . Similarly to the finite case we introduce the operator ϕ : l w 2 → l w 2 , defined in the following way. For any vector θ ∈ l w 2 we consider the coordinate evaluation map θ(x) : X → R defined as θ(x i ) = θ i , for i = 1, . . . , ∞. Then, let where θ * (v ′ ,l ′ ) (x) is the isotonic regression of the restriction of θ(x) to the set X (v ′ ,l ′ ) in the partition of X . The indices v ′ and l ′ are such that The restriction of ϕ(θ) to the comparable index level set I (v,l) will be denoted by [ϕ(θ)] (v,l) Theorem 3 Suppose the Assumption 2 holds. Then the asymptotic distribution of the isotonised estimatorĝ * n is given by where ϕ is the operator defined in (32).
Proof. Let us consider the partition of the original set X = ∪ m l=1 X (l) made for the functiong(x). As it was shown above, the cardinality |X (l) | of each comparable level set in the partition must be less than infinity, unlessg l = 0, in which case it can have infinite cardinality. Since that if the number of terms in the partition is less than infinity, i.e. m < ∞, then some terms (or just one) in the partition are such that the functiong(x) is equal to zero on them, i.e.g l = 0. Therefore, in this case we can use the same approach as in the case of the finite set X (Lemma 4), because in this case he smallest comparable level distanceε, defined in (6), is greater that zero.
Therefore, further in the proof we assume that m = ∞ and write the partition as X = ∪ ∞ l=1 X (l) . First, for any positive integer m ′ < ∞ let us take m ′ terms from the partition of X which satisfy |g l 1 | ≥ |g l 2 | ≥ · · · ≥ |g l m ′ |.
Therefore, leting ε = ε ′ /2 from Lemma 8 and using its result, for the isotonic regressionĝ * n ofĝ n we obtain where I (l ′ ) , for l ′ ∈ {l 1 , . . . , l m ′ }, are the comparable level sets and I (m ′ +1) is the index set of X (m ′ +1) = X \ X (l 1 ) ∪ X (l 2 ) ∪ · · · ∪ X (l m ′ ) . Third, let us introduce a linear operator A (m ′ ) : l w 2 → R s , with s = l∈{l 1 ,...l m ′ } |X (l) |, such that for any g ∈ l w 2 first |X (l 1 ) | elements of the vector A (m ′ ) g are equal to ones taken from g whose indices are in I (l 1 ) , the second |X (l 2 ) | elements are the ones from g whose indices are from I (l 2 ) and so on. Therefore, using the result in (34), the definition of B n and statement (vi) of Lemma 1, the following holds Next, since ϕ is a continuous map, which follows from statement (iii) of Lemma 7, and A (m ′ ) is a linear operator, then from the continuous mapping theorem and delta method [18] it follows that and, using Lemma 5 and result in (35), we prove Note, that the number m ′ is an arbitrary finite integer. Also, since ϕ is a continuous map, then the law of ϕ(λ) has the same continuity sets as λ. Using Lemma 9 we finish the proof of the theorem. ✷ Recall that the cardinality of any comparable level set X (v,l) is less than infinity wheneverg v,l = 0. Then, as in the finite case, we note that the order constraints on X (v,l) can be expressed in the form Ag ≥ 0, for some matrix A. Therefore, one can use the results in [3] to describe the behaviour of [ϕ(g)] (v,l) when |X (v,l) | < ∞. It follows from Theorem 5 in [3] that the distribution of [ϕ(g)] (v,l) is a mixture of 2 |X (v,l) | distributions of the projections of g onto the cone A t g ≥ 0, where the matrixes A t , for t = 1, . . . , 2 |X (v,l) | , are comprised of the rows of the matrix A.
Next, let us consider the case of non-constant weights w. In this section till now we assumed that the vector of weights satisfies the condition in (1), it is fixed, w n = w, i.e. it does not depend on n, and the random elementŝ g n in Assumption 2 all take their values in (l w 2 , B), for some fixed w, with B the Borel σ-algebra generated by the topology which is generated by the natural norm of l w 2 . Now we consider some non-random sequence {w n } n≥1 , taking values in the space R ∞ , where each w n satisfies the condition in (1), converges in some norm || · || R on R ∞ to some non-random vector w, which also satisfies the condition in (1). Next, let B n denotes the Borel σ-algebra generated by the topology which is generated by the natural norm in l wn 2 . Next Lemma shows that the normed spaces l wn Lemma 10 Let two vectors w 1 and w 2 satisfy the condition in (1). Then the normed spaces l w 1 2 and l w 2 2 are equivalent.
Proof. First, we prove that if w satisfies the condition in (1), then x ∈ l w 2 if and only if x ∈ l 2 (l 2 is the space of all square summable sequences, i.e.
Therefore, since inf Therefore, x ∈ l w 2 . Second, let || · || w and || · || denote the natural norms in l w 2 and l 2 . We can prove that if w satisfies the condition in (1), then l w 2 and l 2 are equivalent, i.e. there exist two positive constants c 1 and c 2 such that If we take, for example, c 1 = inf i {w i } and c 2 = sup i {w i }, we prove (37).
Therefore, since the equivalence of norms is transitive, then l w 1 2 and l w 2 2 are equivalent, provided w 1 and w 2 satisfy the condition in (1). ✷ Therefore, since the normed spaces l wn 2 are all equivalent, then the topologies generated by these norms are the same. Then, the Borel σ-algebras B n generated by these topology are also the same. Therefore, the measurable spaces (l wn 2 , B n ) are all the same and we will suppress the index n. Next, analogously to the finite case, let us introduce the notation ϕ w (θ) where θ * w(v ′ ,l ′ ) (x) is the isotonic regression with weights w of the restriction of θ(x) to the comparable level set X (v ′ ,l ′ ) ofg(x), where the indices v ′ and l ′ are such that x i ∈ X (v ′ ,l ′ ) . Next theorem gives the limiting distribution of g * wn n . Theorem 4 Suppose the Assumption 2 holds. Then the asymptotic distribution of the isotonic regressionĝ * wn n of the basic estimatorĝ n is given by where ϕ is the operator, defined in (38).
Proof. First, we note that the result of Lemma 9 holds, it we assume that the random vectors Z n , for n = 1, . . . , ∞ take their values in l wn 2 , if all elements of w n and its limit w satisfy the condition in (1). It follows from the fact that the measurable spaces (l wn 2 , B n ) are equivalent, which was proved in Lemma 10.
The rest of the proof is exactly the same as for Theorem 3 with ϕ and g * n suitable changed to ϕ w andĝ * wn n . Also, recall that the result of Lemma 8 does not depend on the weights w n . ✷

Examples
In this section we consider the problems of estimation of a bimonotone regression function and of a bimonotone probability mass function. First, let us introduce a bimonotone order relation ≪ on a set X := {x = (i, j) T : i = 1, 2, . . . , r, j = 1, 2, . . . , s, }, with r, s ≤ ∞ in the following way. For any x 1 and x 2 in X we have x 1 ≪ x 2 if and only if x 1,1 ≥ x 2,1 and x 1,2 ≥ x 2,2 . The order relation ≪ is a partial order, because it is reflexive, transitive, antisymmetric, but there are elements in X which are noncomparable. A real valued function g(x) is bimonotone if whenever x 1 ≪ x 2 one has g(x 1 ) ≤ g(x 2 ), c.f. [7].

Estimation of a bimonotone regression function
The problem of estimation of a bimonotone regression function via least squares was studied in detail in [7], where authors described an algorithm for minimization of a smooth function under bimonotone order constraints.
The least squares estimate ofg(x) under bimonotone constraints is given by where F is denotes the set of all bounded bimonotone functions on X ,ĝ n (x) is the average of Y i , i = 1, . . . , n, over the design element x, i.e. and Note that g n (x) in (41) is the unconstrained least squares estimate of g(x). The asymptotic properties of nonlinear least squares estimators were studied in [14,20]. Assume that the design points x i , with i = 1, . . . , n, satisfy the following condition as n → ∞, where w (n) is a sequence of vectors in R r×s + whose components are from (42), and w ∈ R r×s + . Given the condition in (43) is satisfied, the basic estimatorĝ n (x) is consistent and has the following asymptotic distribution where Y 0,Σ is a Gausian vector with mean zero and diagonal covariance matrix Σ, whose elements are given by Σ ii = σ 2 w i , for i = 1, . . . , r × s, c.f. Theorem 5 in [20]. Next theorem gives the asymptotic distribution of the regression function under bimonotone constraints.