Isotone Optimization in R: Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methods

This introduction to the R package isotone is a (slightly) modiﬁed version of de Leeuw et al. (2009), published in the Journal of Statistical Software . In this paper we give a general framework for isotone optimization. First we discuss a generalized version of the pool-adjacent-violators algorithm (PAVA) to minimize a separable convex function with simple chain constraints. Besides of general convex functions we extend existing PAVA implementations in terms of observation weights, approaches for tie handling, and responses from repeated measurement designs. Since isotone optimization problems can be formulated as convex programming problems with linear constraints we then develop a primal active set method to solve such problem. This methodology is applied on speciﬁc loss functions relevant in statistics. Both approaches are implemented in the R package isotone .


Introduction -History of monotone regression
In monotone (or isotone) regression we fit an increasing or decreasing function to a set of points in the plane. This generalization of linear regression has a fairly complicated history, which we now discuss.
On June 4, 1958, Constance van Eeden defended her dissertation at the University of Amsterdam. The dissertation Testing and Estimating Ordered Parameters of Probability Distributions (van Eeden 1958) summarized and extended four articles published in [1956][1957] in Indagationes Mathematicae. Van Eeden's promotor Van Dantzig said in his speech at the promotion As often happens, just before you were able to obtain a final version of your result, a publication of an American professor appeared, which treated the same problem, while in a second publication by him, joint with four co-authors, the special case of a complete order was investigated in more detail.
He referred to Brunk (1955) and Ayer, Brunk, Ewing, Reid, and Silverman (1955), which both appeared in the Annals of Mathematical Statistics. To complicate the situation, there were also the papers by Miles (1959) and Bartholomew (1959a,b), which can be thought of as yet another independent discovery. Some of the interconnectivity, and some of the early history, is reviewed in the excellent book by the four B's (Barlow, Bartholomew, Bremner, and Brunk 1972).
Of the classical treatments of monotone regression, Van Eeden's is the most general one. Moreover she very clearly separates the optimization problem, treated in full detail in Chapter 1 of her dissertation, from the statistical applications. Of course we must realize that this work predates the advent of convex analysis by about 15 years, while it was done at the time that statisticians started showing interest in the Kuhn-Tucker conditions and in quadratic programming. In Van Eeden's setup, the problem is to minimize a strictly unimodal function f (x 1 , . . . , x n ) under the bound constraints α i ≤ x i ≤ β i and the partial order constraints σ ij (x i − x j ) ≥ 0. Here the σ ij are either zero or ±1, and we suppose the system of inequalities defined by the bound and order constraints is consistent. By careful classical analysis, Van Eeden derives her algorithms, and shows how they specialize if the objective function is separable and/or a sum of squares, and if the order constraints define a complete order.
The next major contributions were by Robertson and Dykstra, summarized in the book by Robertson, Wright, and Dykstra (1988). At the same time, starting with  and Dykstra (1981), the isotonic regression problem was embedded more and more in quadratic and convex programming frameworks by, among others, Best and Chakravarti (1990) and Best, Chakravarti, and Ubhaya (2000). Recent major contributions, relying heavily on mathematical programming results, are Strömberg (1991), Ahuja and Orlin (2001), and Hansohm (2007). Extensions to particular partial orders defined by the variables of a multiple regression are in Dykstra and Robertson (1982) and Burdakov, Grimvall, and Hussian (2004).

A general isotone optimization framework 2.1. Basics of monotone regression
In simple linear regression we assume a linear relationship between a predictor z = (z 1 , . . . , z i , . . . z n ) and a response y = (y 1 , . . . , y i , . . . y n ). Note that for the predictors we use z instead of the common notation x since later on we embed this algorithm into a convex programming problem where the target variables are typically denoted by x. However, the loss function to be minimized is a least squares problem of the form where α and β are the regression parameters and w i some optional observation weight. Extensions can be formulated in terms of polynomial or other nonlinear parametric regression specifications. In many situations the researcher has no information regarding the mathematical specification of the true regression function. Rather, she can assume a particular shape which can be characterized by certain order restrictions. Typically, this involves that the y i 's increase with the ordered z i 's. Such a situation is called isotonic regression; the decreasing case antitonic regression. The corresponding umbrella term for both cases is monotonic regression (see de Leeuw 2005, for a compact description).
Suppose that Z is the finite set {z 1 , z 2 , . . . , z n } of the ordered predictors with no ties, i.e. z 1 < z 2 < · · · < z n . Let y be again the observed response vector and x = (x 1 , . . . , x i , . . . x n ) the unknown response values to be fitted. The least squares problem in monotonic regression can be stated as which has to be minimized over x under the inequality restrictions x 1 ≤ x 2 ≤ · · · ≤ x n for isotonic regression and x 1 ≥ x 2 ≥ · · · ≥ x n for the antitonic case. These restrictions avoid that we always get perfect fit. The basic theorem the isotonic solution of (2) is based on, is that if y i ≥ y i+1 , then the fitted valuesx i+1 :=x i . Correspondingly, the antitonic solution requires y i ≤ y i+1 .
For a non-strict partial order of the predictors, i.e. z 1 ≤ z 2 ≤ · · · ≤ z n , several algorithms for the treatment of ties can be considered (Kruskal 1964;de Leeuw 1977). The primary approach partitions the index set {1, 2, . . . , n} into a number of tie blocks I 1 , I 2 , . . . , I k with k ≤ n. In case of a tie z i = z i this approach implies that x i does not necessarily equal x i . It forces only that the following monotonicity condition holds for the tied observations i and i : If y i ≤ y i then x i ≤ x i . In practical applications this approach is mostly used. The secondary approach is more restrictive and requires x i = x i for the tie i and i , regardless which y-values were observed. The tertiary approach defined in de Leeuw (1977) abandons the monotonicity condition from the primary approach. For each tie block I 1 , I 2 , . . . , I k the unit of analysis are the weighted means x I 1 , x I 2 , . . . , x I K . The tertiary approach requires only that these means are monotonic across tie blocks.

Generalization to p and multiple measurements
Now we extend this classical isotonic regression problem in terms of non-least squares loss functions and repeated observations. We establish the optimization problem P 0 which includes a convex function of the form The building blocks of P 0 are a (possibly infinite) open real interval I, and n real-valued convex functions f i defined on I. Problem P 0 is to minimize the separable function of n variables of the form over all non-decreasing x, i.e. all x ∈ I n . The response vectors of length m i for each observation i we denote as y i which can be caused, e.g., by repeated observations y ij .
Common special cases of (3) are p = 2 and p = 1: Least-squares estimation for the first problem result in estimates that approximate the conditional mean of the response given predictor value z i . If weights w i are involved, we get the weighted mean. If we estimate the second equation within a quantile regression framework (see Koenker 2005), the estimates approximate the (weighted) median. The general quantile regression specification is with a, b > 0. The corresponding proportions are a/(a + b) and b/(a + b). Note that for a = b Equation 7 reduces again to the weighted median. Let us refer to the expression for the conditional (weighted) mean and (weighted) quantiles as solver s(x i ). This function plays a central role for the parameter updates in PAVA described in the next section. Note that in our general framework the user can specify other convex functions and their corresponding solvers. As we will see in Section 5.1 the gpava() function takes a (user-defined) solver as argument which is sufficient for defining the PAVA optimization problem.

Isotone optimization as a convex programming problem
The whole theory in these sections is based on the fact that isotone optimization can be formulated as convex programming problem with inequality constraints. For instance, the least squares problem in Equation 1 is one particular example of such a convex program. The isotonicity condition is contained in the side constraints. These inequalities defining isotonicity can be written in matrix form as Ax ≥ 0 with x ∈ R n as the target variable to be optimized. A is a m × n matrix in which each row corresponds with an index pair i, j such that i j. Such a row of length n has element i equal to +1, element j equal to −1, and the rest of the elements equal to zero. Formally, such isotone optimization problems, written as convex programming problem with linear inequality constraints look as follows: The constraints in isotone regression are a total order. We now consider the more general case of partial orders. In order to eliminate redundancies we include a row in A for a pair (i, j) iff i covers j, which means that i j and there is no k = i, j such that i k j. This specification allows us to specify isotonicity patterns in a flexible manner. Some examples are given in Figure 1 by means of Hasse diagrams (also known as cover graphs).
with λ as the Lagrange multiplier vector. The associated Lagrange dual function can be expressed as and the corresponding dual optimization problem is The Lagrange dual can be reformulated using the convex conjugate. Let x ∈ R n . The conjugate function of f (x) denoted by f (x ) is the maximum gap between the linear function x , x and the target function f (x) and can be expressed as It can be shown (see Boyd and Vandenberghe 2004, p. 221) that In other words: The Lagrange dual is determined by the convex conjugate with A λ. This relates the convex primal and the dual optimization problem in the following manner: As necessary conditions for the minimum we give the Karush-Kuhn-Tucker (KKT) conditions for this problem. The convex function f is minimized on a convex set {x | Ax ≥ 0} atx iff there exists a vector of Lagrange multipliersλ (i.e., the KKT vector) such that (see Rockafellar 1970, Chapter 28) Here ∂f (x) is the subdifferential of f atx. In general, the subdifferential at x is the set of all The subdifferential is a convex compact set. If f is differentiable at x, there is a unique subgradient, the gradient ∇f (x). Thus, the necessary and sufficient conditions for ax to be a minimizer in the differentiable case are the existence of a KKT vectorλ such that These conditions are referred to as stationarity, primal feasibility, dual feasibility, and complementary slackness.

Active set method for linearly constrained convex programming problems
Basically, there are two classes of algorithms to solve convex optimization problems: The first class are interior point methods aim for complementary slackness while maintaining primal and dual feasbility. The second class are active set strategies where we distinguish between two variants: Primal active set methods: They aim for dual feasibility while maintaining primal feasibility and complementary slackness.
Dual active set methods: They aim for primal feasibility while maintaining dual feasibility and complementary slackness.
Active set methods in general are approaches for solving linear, quadratic, and convex programming problems with inequality constraints and are the most effective for solving smallto medium-scaled problem. In our implementation, by means of the function activeSet(), we provide a primal active set strategy which Zangwill (1969, Chapter 8) denotes as manifold suboptimization.
Recall problem P given in Equation 8 where the aim is to minimize f (x) over Ax ≥ 0. The minimum isf (x) and the corresponding minimizer isx. Write I for subsets of the index set I = 1, 2, · · · , m for the constraints. Then A I is the corresponding card(I) × n submatrix of A, and A I is the (m − card(I)) × n complementary submatrix. The active constraints at x, which we write as A , are the indices i for which a i x = 0. That is, at a given point x a constraint is called Each iteration in the active set algorithm attempts to locate the solution to an equality problem in which only the active constraints occur. There are more ways to state the active set optimization problem for our general convex case with linearity restrictions. Let us first elaborate an auxillary problem denoted by P + which partitions the constraints set into equality and strict inequality restrictions. Note that this partitions the constraint set x | Ax ≥ 0 into 2 m faces, some of which may be empty. Solutionx + I is optimal for P + I iff there exists a KKT vectorλ + I such that The crucial condition thatx + I is the optimum also for P is the dual feasibility condition λ + I ≥ 0. If this holds thenx + I is optimal for P. Conversely, if A are the indices of the active constraints at the solutionx of P, thenx solves P + as well.
A second way to give an active set formulation for our setting is problem P I that includes equality restrictions only: Solutionx I is optimal for iff there exists a KKT vectorλ I such that To ensure thatx I is optimal for P as well we have to check whether the primal feasibility A Ix I ≥ 0 and, again, the dual feasibilityλ I ≥ 0 holds. Conversely,x with the corresponding active constraints A solves P. Thus, if we knew A we could solve P by solving P I .
In order to achieve this we discuss an equivalent formulation of P I . Basically, the equality constraints A I x = 0 define a relation ≈ I on 1, 2, · · · , n, with i ≈ I k if there is a row j of A I in which both a ji and a jk are non-zero. The reflexive and transitive closure ≈ I of ≈ I is an equivalence relation, which can be coded as an indicator matrix G I , i.e. a binary matrix in which all n rows have exactly one element equal to one.
In the package we compute G I from A I in two steps. We first make the adjacency matrix of ≈ I and add the identity to make it reflexive. We then apply Warshall's Algorithm (Warshall 1962) to replace the adjacency matrix by that of the transitive closure ≈ I , which is an equivalence relation. Thus the transitive adjacency matrix has blocks of ones for the equivalence classes. We then select the unique rows of the transitive adjacency matrix (using the unique() function), and transpose to get G I . Note that G I is of full column-rank, even if A I is singular. We write r I for the number of equivalence classes of ≈ I . Thus G I is an n × r I matrix satisfying A I G I = 0, in fact G I is a basis for the null space of A I . Moreover D I ∆ = G I G I is diagonal and indicates the number of elements in each of the equivalence classes.
Finally, problem P I can be rewritten as unconstrained convex optimization problem with ξ I ∈ R r I . The vectorξ I is a solution iff 0 ∈ G I ∂f (G IξI ). Thenx I = G IξI solves P I . If 0 ∈ G I ∂f (G IξI ) it follows that there is a non-empty intersection of the subgradient ∂f (G IξI ) and the row-space of A I , i.e., there is a KKT vector A Iλ I ∈ ∂f (G IξI ).
In R we can simply use the optim() function (e.g. with BFGS quasi-Newton) to minimize f (G I ξ) over ξ, and then setx = G Iξ . This guarantees (if the optimum is found with sufficient precision) that the gradient atx is orthogonal to the indicator matrix G I , and consequently that Lagrange multipliers can be computed. By making sure that A I has full row-rank, the Kuhn-Tucker vector is actually unique. Because of the way optim() is written it is also possible to tackle problems with non-differentiable loss functions, or even non-convex ones. But we are not responsible if this gets you into trouble.
To conclude, in this section we elaborated auxillary formulations of the active set problem. In the algorithmic implementation described in the next section we tackle optimization of problem P by defining and solving iteratively subproblems P I (including the indicator matrix G I ).
3. PAVA and active set algorithm 3.1. A generalized PAVA approach  present a graphical interpretation of monotonic regression in terms of the greatest convex minorant (GCM) and the method of successive approximation to the GCM can be described algebraically as PAVA (see e.g. Robertson et al. 1988, p. 9-10). In this paper Jan de Leeuw, Kurt Hornik, Patrick Mair 9 we focus on the algorithmical description of our PAVA implementation following Barlow et al. (1972, p. 15).
PAVA is a very simple iterative algorithm for solving monotonic regression problems. The computation of a non-decreasing sequence of values x i such that the problem P 0 is optimized. In the simple isotonic regression case we have the measurement pairs (z i , y i ). Let us assume that these pairs are ordered with respect to the predictors. The initial solution at iteration l = 0 is x 2. Solve f (x) each block r, i.e., compute the update based on the solver which gives x (l+1) r r increase l := l + 1 an go back to Step 1.
The algorithm stops when the x-blocks are increasing, i.e., x r for all r. Finally the block values are expanded with respect to the observations i = 1, . . . , n such that the final result is the vector x of length n with elementsx i of increasing order. In the case of repeated measurements as described in Section 2.2 the starting values are x (0) i := s(y ij ). Then we proceed as described above.
A final notes considers the relation between the generalized PAVA and the active set strategy. Robertson et al. (1988, Chapter 6) provide a chapter about duality in monotone regression. For the simple least squares case Best and Chakravarti (1990) show that PAVA can be considered as a dual active set method. In a subsequent paper, Best et al. (2000) they extend this relation to the general p case. Therefore it can be concluded that PAVA for general p cases belong to the family of dual active set methods.

Algorithmic implementation of the active set formulation
Active set descriptions for convex (quadratic) programming problems can be found in Fletcher (1987, Section 10.3) and Nocedal and Wright (1999, Section 16.4). In general, based on a feasbile starting point, the active set strategy works as follows: 1. Solve the equality problem defined by the active set A.

Compute the Lagrange multipliers λ for A.
3. Remove a subset of constraints with λ < 0. 4. Search for infeasible constraints and return to step 1.
These steps are repeated until we find a solution that is "optimal enough". In practice (and in our package), the zero-boundaries for the constraints and the dual feasiblity checking are relaxed in favor of a small value ε which leads to Ax ≥ −ε, and λ ≥ −ε.
Again, our goal is to solve problem P given in (15). Based on the elaborations above, in our active set approach we solve a finite sequence of subproblems P I , that minimize f (x) over x satisfying A I x. After solving each of the problems we change the active set A , either by adding or by dropping a constraint. The algorithm can be expected be efficient if minimizing f (x) under simple equality constraints, or equivalently minimizing f (G I ξ I ) over ξ I , can be done quickly and reliably. Starting with a feasible point x (0) which defines the index sets I (0) = A (0) and I 3. Ifx (l) is feasible we determine the corresponding Lagrange multipliers λ (l) I for P I (l) . If λ (l) ≥ 0 we have solved P and the algorithm stops. If min λ (l) I < 0, we find the most negative Lagrange multiplier and drop the corresponding equality constraint from A (l) to define a new and smaller set of active constraints A (l+1) and go back to step 1.
In step 2 we solve over min i∈A Finding the smallest Lagrange multiplier in step 3 is straightforward to implement in the differentiable case. We have to solve A I (l) λ I = ∇f (x (l) ). This is achieved by using the formulation given in (17). Because G I (l) ∇f (x (l) ) = 0 and A I (l) is of full row-rank, there is a unique solution λ I (s) .

Special cases for active set optimization
In the convex non-differentiable case, matters are more complicated. We have to deal with the fact that in general ∂f (x) may not be a singleton. It is possible to develop a general theory for active set methods in this case (Panier 1987), but we will just look at two important special cases.

The weighted Chebyshev norm
The first case we consider is the ∞ or Chebyshev norm. We have to minimize where h(ξ) = y − Gξ are the residuals based on the observed response values y. We assume, without loss of generality, that w i > 0 for all i. The minimization can be done for each of the r columns of the indicator matrix G with elements g ij separately. The solutionξ j is the corresponding weighted mid-range. More specifically, let I j = {i | g ij = 1}. Then If the (not necessarily unique) maximum over (i, k) ∈ I j is attained at (i j , k j ), then the minimum of f over ξ is attained atξ where we choose the order within the pair (i j , k j ) such that y i j ≤ξ j ≤ y k j . Now These results also apply if I j is a singleton {i}, in which caseξ j = y i and f j (ξ j ) = 0. Set x = Gξ.
Next we must compute a subgradient in ∂f (x) orthogonal to G. Suppose that e i is a unit weight vectors, i.e. a vector with all elements equal to zero, except element i which is equal to either plus or minus w i . Consider the set E of the 2n unit weight vectors. Then Then, by the formula for the subdifferential of the pointwise maximum of a finite number of convex functions (also known as Danskin's Theorem; see Danskin 1966), we have ∂f (ξ) = conv(E(ξ)) with conv() as the convex hull.
Choose any j for which f j (ξ j ) is maximal. Such a j may not be unique in general. The index pair (i j , k j ) corresponds with the two unit weight vectors with non-zero elements −w i(j) and +w k(j) . The subgradient we choose is the convex combination which has element −1 at position i j and element +1 at position k(j). It is orthogonal to G, and thus we can find a corresponding Kuhn-Tucker vector.
In the isotone package the Chebyshev norm is provided by the solver function mSolver().

The weighted absolute value norm
For the 1 or weighted absolute value norm we find the optimumξ by computing weighted medians instead of weighted mid-ranges. Uniqueness problems, and the subdifferentials, will generally be much less smaller than in the case of ∞ .
For 1 we define E to be the set of 2 n vectors (±w 1 , ±w 2 , · · · , ±w n ). The subdifferential is the convex hull of the vectors e ∈ E for which e i h(ξ) = min ξ f (ξ). If h i (ξ) = 0 then e i = sign(h i (ξ))w i , but if h i (ξ) = 0 element e i can be any number in [−w i , +w i ]. Thus the subdifferential is a multidimensional rectangle. If the medians are not equal to the observations the loss function is differentiable. If h i (ξ) = 0 for some i in I j then we select the corresponding element in the subgradient in such a way that they add up to zero over all i ∈ I j .
In the package the dSolver() function implements the weighted absolute value norm.

Some additional solvers
The isotone package provides some pre-specified loss functions but allows also the user to define his own functions. The corresponding argument in the function activeSet() is mySolver. Each solver has extra arguments which we describe in this section.
A user-defined specification of a convex differentiable function can be achieved by setting mySolver = fSolver. The target function itself is passed to the function by means of the fobj argument and the corresponding gradient using gobj. Note that it is not at all necessary that the problems are of the regression or projection type, i.e. minimize some norm y−x . In fact, the driver can be easily modified to deal with general convex optimization problems with linear inequality constraints which are not necessarily of the isotone type. We give examples in the next section. Now let us describe some functions pre-specified in the package. We start with the cases which solve the equivalent formulation of problem P I given in (17), that is, the one which minimizes over ξ. Because of the structure of these problems it is more efficient to use this formulation. The solvers passed to the mySolver argument are lsSolver: Solves the least squares 2 norm given in Equation 2. dSolver: Minimizes the weighted absolute value 1 norm as given in (23).
pSolver: Solves the quantile regression problem as given in (7).

lfSolver: Tackles a general least squares problem of the form
where W is a not necessarily positive semi-definite matrix of order n.
The extra arguments for all these solvers are y for the observed response vector and weights for optional observation weights. The functions return a list containing x as the fitted values, lbd as the Lagrange vector, f as the value of the target function, and gx as the gradient at point x.
Furthermore, we provide some hybrids which are just wrapper functions and call fSolver() internally. It is not needed that fobj and gobj are provided by the user.
sSolver: Solves the negative Poisson log-likelihood with loss function f (x) = n i=1 x i − y i log(x i ).
oSolver: Minimizes the p norm as given in (4) with m i = 1 ∀i. aSolver: Efron (1991) gives an asymmetric least squares formulation based on (2). The weights b w for (y i − x i ) ≤ 0 (extra argument bw) and a w for ( Correspondingly, the additional extra argument is eps. hSolver: Solves the Huber loss-function (Huber 1981) with extra argument eps.  Chu, Keerthi, and Ong (2004) with the two extra parameters ε (argument eps) and β (argument beta). We have to distinguish as follows: With a little extra effort various other fashionable SVM and lasso isotone regressions could be added.

Package description and examples
The isotone package consists of two main components: The PAVA component with its main function gpava() and the active set component with its main function activeSet(). In the following sections we describe their basic functionalities and give corresponding examples. Table 1 gives an overview of available PAVA implementations in R. It quotes the package name, the corresponding function, whether this function is externally accessible, a brief description, and in which language the core computations are implemented.

The PAVA component
None of the functions in Table 1 allows for the specifications of general convex functions, repeated measurements and the handling of ties for such structures. Our package which is available on CRAN, allows for the computation of rather general target functions with additional options in terms of repeated measurements, tie handling, and weights.
The main function for this generalized PAVA computation is gpava(). S3 methods such as print and plot are provided and in the following subsections we present simple two examples.

Monotone regression with ties for pituitary data
The first example is based on a dataset from Pothoff and Roy (1964) which is also analyzed in Robertson et al. (1988). The Dental School at the University of North Carolina measured the size of the pituitary fissue (in millimeters) on 11 subjects from 8 to 14 years. The predictor is age and it can be expected that the size of the fissure increases with age. The data are of the following structure: > data(pituitary) > head ( As we see we have a several ties on the predictors. By means of this example we present different results caused by different approaches for handling ties (i.e. primary, secondary, tertiary). We compute the isotonic regression using the function gpava(): > res1 <-with(pituitary, gpava(age, size, ties = "primary")) > res2 <-with(pituitary, gpava(age, size, ties = "secondary")) > res3 <-with(pituitary, gpava(age, size, ties = "tertiary")) For the primary method we can have different (monotonic) fitted values within tied observations, for the secondary the fitted values are the same within ties, and the tertiary approach only requires monotonicity on the means. This can be examined by doing > tapply(res3$x, res3$z, mean) Using the tertiary approach the fitted values within ties can even decrease (see, e.g., within 10 year old children). The plots in Figure 2 show the fitted step functions for each of the approaches.

Repeated measures using posturographic data
To demonstrate PAVA on longitudinal data we use a subset of the data collected in Leitner, Mair, Paul, Wick, Mittermaier, Sycha, and Ebenbichler (2008). The sample consists of 50 subjects (healthy controls and patients chronical low back pain). The subjects' task on a sensory organization test (SOT) was to keep the balance on a dynamic 18×18 inches dual force plate. Measurements on various testing conditions were collected and the SOT equilibrium scores computed. They were based on the maximum anterior posterior sway angle during the SOT trials and reflected the overall coordination of visual, proprioceptive and vestibular components of balance to maintain standing posture. These equilibrium scores represent the angular difference between the subject's calculated anterior/posterior centre of gravity displacements and the theoretical maximum of 12.5 • . A score of 100 theoretically indicates no anterior/posterior excursion. A score of 0 indicates a fall of the subject. Thus, the higher the score, the more able a person is to maintain the balance.
The subset we select for this example is based on the condition where both proprioceptive and visual stimuli are altered by moving a surrounding visual screen and the platform with the subject's anterior/posterior body sway. We examine the relationship between body height and three repeated SOT scores.
> data("posturo") > head(posturo) PAVA is performed on the weighted median target function and using the secondary approach for ties. The fitted step functions for the two different target functions are given in Figure 3.

The active set component
As mentioned above a primal active set implementation according to the elaborations in Section 2.4 is provided by the function activeSet() which has the following arguments: z: Vector containing observed predictor values z. The response values y are passed to the solver function (see below). The predictor values z define the order for the response y. The variable btota defines the pairwise isotonicity matrix which needs to have 2 columns and as many rows as unique isotonicity constraints. We see that this matrix defines a total order. The specification is always as follows: element column 2 element column 1, e.g., the first row states x 2 ≥ x 1 .
Let us start with a simple least squares model using the lsSolver and the equivalent userspecification of the 2 norm using fSolver with corresponding target function fobj and its gradient gobj.
> set.seed(12345) > yp <-rpois(9, 5) > fit.poi <-activeSet(z, Atot, sSolver, y = yp) So far we focused on total orders only. Now we back to the LS case and specify different types or orders according to the Hasse diagrams in Figure 1. The tree order in Subfigure 1(b) can be defined as follows: and extended it to general convex programming problems with linear inequality constraints. This leads to a general isotone optimization framework that is implemented in the R package isotone. Applications on real and simulated data sets were shown.
One option for the extension of the PAVA algorithm is to allow for multiple predictors. This approach is elaborated in Burdakov et al. (2004).
Of course, the field of convex analysis is rather extensive and our activeSet() function covers only a certain fraction (cf. Boyd and Vandenberghe 2004). Since we use optim() in the fSolver() function this may cause troubles for non-differentiable convex functions. Therefore, a replacement of this optimizer would facilitate the optimization of such nondifferentiable problems. In addition, for a target f (x) we allow for one particular type of solver s(x) only, that is, the target function is separable but the components must be of the same type. This could be extended in terms of different solvers for one particular optimization problem.