isocir : An R Package for Constrained Inference using Isotonic Regression for Circular Data, with an Application to Cell Biology.

In many applications one may be interested in drawing inferences regarding the order of a collection of points on a unit circle. Due to the underlying geometry of the circle standard constrained inference procedures developed for Euclidean space data are not applicable. Recently, statistical inference for parameters under such order constraints on a unit circle was discussed in Rueda et al. (2009); Fernández et al. (2012). In this paper we introduce an R package called isocir which provides a set of functions that can be used for analyzing angular data subject to order constraints on a unit circle. Since this work is motivated by applications in cell biology, we illustrate the proposed package using a relevant cell cycle data


Introduction
Circular data arise in a wide range of contexts, such as in geography, cell biology, circadian biology, endocrinology, ornithology, etc (cf Zar (1999) or Mardia et al. (2008)).This work is motivated by applications in cell cycle biology where one may want to draw inferences regarding angular parameters subject to order constraints on a unit circle.A cell cycle among eukaryotes follows a well-coordinated process where cells go through cycle four phases of distinct biological functions, namely, G1, S, G2 and M phase (see Figure 1).cular order and extended the notion of isotonic regression estimator to circular parameter space by defining the circular isotonic regression estimator (CIRE).Using CIRE, Fernández et al. (2012) developed a formal statistical theory and methodology for testing whether the circular order of peak expression of a subset of cell cycle genes is conserved across multiple species.
These statistical methods may have numerous other applications apart from cell cycle.With the increase in human survival rates, there is considerable interest in understanding neuro-Figure 1: Phases of a cell cycle Genes participating in the cell division cycle are often called cell cycle genes.A cell cycle gene has a periodic expression with its peak expression occurring just before its biological function (Jensen et al. (2006)).Since the periodic expression of a cell cycle gene can be mapped onto a unit circle, the angle corresponding to its peak expression is known as the phase angle of the gene (see Liu et al. (2004) and Figure 2).
Since cell cycle is fundamental to the growth and development of an organism, cell biologists have been interested in understanding various aspects of cell cycle that are evolutionarily conserved.For instance, they would like to identify genes whose relative order of peak expressions is evolutionarily conserved.In order to solve such problems Rueda et al. (2009) introduced an order restriction on the unit circle, called cir-Figure 2: Phase angle (φ) logical diseases related to aging such as the Alzheimer's disease (AD) and the Parkinson's disease (PD).An important aspect of such neurological diseases is the disruption of circadian clock and genes participating in it.Researchers are interested in testing differences in the phases of expression of circadian genes in normal and AD patients (Cermakian et al. (2011)).
Methodology discussed in this paper can be used for analyzing such data.Other areas of applications include; the study of migratory patterns and directions of birds (Cochran et al. (2004)), the changes in the wind directions (Bowers et al. (2000)), directional fluctuations in the atmosphere (van Doorn et al. (2000)), psychology (studies of mental maps or monitoring data (Kibiak and Jonas (2007)), the orientation of ridges in fingerprints or magnetic maps (Boles and Lohmann (2003)).
Motivated by the wide range of applications and the non-existence of a user friendly software, in Section 3 we introduce the isocir package programmed in R environment, R Development Core Team (2011), which can be downloaded from http://cran.r-project.org/web/packages/isocir/.The package provides functions which can be used for drawing inferences regarding the order of a collection of points on a unit circle.In Section 2 we describe the statistical problem and the methodology of Rueda et al. (2009) and Fernández et al. (2012).The isocir package is illustrated in Section 4 using the motivating cell cycle gene expression data.Some concluding remarks are provided in Section 5.

Circular Order Restriction
Let θ = (θ 1 , θ 2 , . . ., θ q ) where each θ i , i = 1, 2, . . ., q is the sample circular mean of a random sample of size n i from a population with unknown circular mean φ i .All angles are defined in a counter clockwise direction relative to a given pole.The mean resultant lengths for each population are denoted as r 1 , . . ., r q (see Mardia and Jupp (2000) for the definition of circular mean and mean resultant length of a set of angles).Then the problem of interest is to draw statistical inferences on φ i , i = 1, 2, . . ., q, subject to the constraint that the angles φ 1 , φ 2 , . . ., φ q are in a counter clockwise order on a unit circle.Thus φ 1 is "followed" by φ 2 which is "followed" by φ 3 , . . ., φ q is "followed" by φ 1 .More precisely, we shall denote this simple circular order among angular parameters as follows: It is important to note that the order among the angular parameters is invariant under changes in location of the pole (the initial point of the circle).Unlike the Euclidean space, points on a unit circle wrap around.That is, starting at the pole, by traveling 2π radians around the circumference of the circle one would return to pole.For this reason the circular order among points on a unit circle is preserved even if the location of the pole is shifted.This is why Rueda et al. (2009) and Fernández et al. (2012) refer to the circular order C sco as isotropic order.As in this paper we will also consider more general circular order restrictions, from now on we will refer to C sco as simple circular order.
As a consequence of the geometry, a circle can never be linearized and hence methods developed for Euclidean space data are not applicable to circular data.The problem is even more challenging when the angular parameters are constrained by an order around the circle, such as C sco .General methodology for circular data, when there are no constraints on the angular parameters, can be found in the book Mardia and Jupp (2000), among others.Constrained inference for circular data is rather recent (Rueda et al. (2009) and Fernández et al. (2012)).
As noted in Rueda et al. (2009), standard Euclidean space methods such as the pool adjacent violators algorithm (PAVA) used for computing isotonic regression (see Robertson et al. (1988) for details) cannot be applied to circular data.For example, when a cell biologist is investigating a large number of cell cycle genes, it may be difficult to ascertain the circular order among all cell cycle genes under consideration.However, based on the underlying biology, the investigator may a priori know the circular order among groups of genes but not the order among genes within each group.In such situations a partial circular order, C pco as defined below, can be used.
(2) In this case we have L sets of parameters with l j angular parameters in set j and q = L j=1 l j .Order among the parameters within a set is not known but every parameter in a given set is "followed" by every parameter in the next set.

Estimation and Testing under Circular Order Restrictions
Analogous to PAVA for Euclidean data, Rueda et al. (2009) derived a circular isotonic regression estimator (CIRE) for estimating angular parameters (φ 1 , φ 2 , . . ., φ q ) subject to a circular order.The CIRE of φ = (φ 1 , φ 2 , . . ., φ q ), under the constraint (φ 1 , φ 2 , . . ., φ q ) ′ ∈ C sco , is given by: where SCE(φ, θ), defined below, is the Sum of Circular Errors, a circle analog to the Sum of Squared Errors (SSE) used for Euclidean data; where r i are the mean resultant lengths.The CIRE is implemented in the function CIRE of the package isocir.
Just as the normal distribution is commonly used for the Euclidean space data, the von-Mises distribution is widely used for describing angular data on a unit circle.Accordingly, for i = 1, 2, . . ., q, throughout this paper we assume that θ i are independently distributed according to von-Mises distribution, denoted as M (φ i , κ) where φ i is the modal direction and κ is the concentration parameter (see Mardia and Jupp (2000)).Under such an assumption, Fernández et al. (2012) developed a conditional test for testing the following hypotheses: The conditional test statistic is given by , where κ is the estimator of κ, φ is the CIRE computed under H 0 .The estimate φ determines a partition of ℘ = {1, . . ., I} into sets of coordinates on which φ is constant.These sets are called level sets.The rejection region for the conditional α-level test is given by: where m is the number of level sets for φ and, for large values of κ, the approximate critical value c(m) is chosen so that: where F q−m,q−1 represents the central F random variable with (q − m, q − 1) degrees of freedom.The above test statistic is proportional to a chi-square test when κ is known.For details one may refer to Fernández et al. (2012).The above methodology can be modified to test The simulation study performed in Fernández et al. (2012) suggests that the power of this test is quite reasonable.Notice that, for a given data set, the p-value obtained by using the above methodology may serve as a useful goodness of fit criterion when comparing two or more plausible circular orders among a set of angular parameters.Larger values may suggest that the estimations are closer to the presumed circular order.Thus the statistical methodology developed in Fernández et al. (2012) can be used not only for testing relative order among the parameters.It can be also useful for selecting a "best fitting" circular order among several circular order candidates.
These tests are implemented in the function cond.test in the R package isocir introduced in the next Section.

Package isocir
We start this section by giving some background on R packages for isotonic regression and analysis of circular data.We then describe the structure of our package isocir (Barragan et al. (2013)) and illustrate it by some examples.

Related Packages
As isotonic regression is a well-known and widely used technique there are many packages in R for performing isotonic regression, such as: • isotone (de Leeuw et al. (2009)): Active set and generalized PAVA for isotone optimization.
Similarly, there are several packages in R for analyzing circular data, such as: • CircStats (Lund and Agostinelli ( 2009)): The implementations of the Circular Statistics from "Topics in circular Statistics" Jammalamadaka and SenGupta (2001).It is an R port from the S-plus library with the same name.
• circular (Agostinelli and Lund (2011)): This package expands in several ways the Circ-Stats package.
Since none of the existing packages for circular data are applicable for analyzing circular data under constraints, in this article we introduce the software package "isotonic inference for circular data", with the acronym isocir, for analyzing circular data under constraints.Our package depends on circular (see Agostinelli and Lund (2011)) and combinat (see Chasalow (2010)).These packages should be installed in the computer before loading isocir.

Package Structure
For the convenience of the reader, we summarize all the functions, arguments and descriptions of our package isocir in Table 1.In the following we describe each function of the software in detail.

Functions
• Functions sce() and mrl() The auxiliary function sce computes the sum of circular errors between a given q-dimensional vector (denoted by arg1) and one or more q-dimensional vectors (denoted by arg2).The function mrl computes the mean resultant length for the input data.
• Function CIRE() Using the methodology developed in Rueda et al. (2009), this R function computes the CIRE for a given circular order (1) or a partial order (2).The arguments of this function are summarized in If there is only one replication then data is a vector.The position i in the vector groups contains the group number to which the parameter φ i belongs to.The logical argument circular sets whether the order is wrapped around the circle, i.e. circular order (circular=TRUE) or not, i.e. simple order (circular=FALSE).For example, the simple order cone in the circle would be a non circular order.The output of this function is an object of class isocir (explained later) containing the circular isotonic regression estimator ( φ), the unrestricted circular means (θ) and the corresponding sum of circular error (SCE( φ, θ)).
• Function cond.test() This function performs the conditional test and computes the corresponding p-value for the following hypotheses: H 0 : The angles φ 1 , . . ., φ q follow a (simple or partial) circular order.
H 1 : H 0 is not true.and groups are same as those in function CIRE although in this function groups is the order to be tested instead of the known order.The argument kappa is needed only when data is a vector.If there are no replications in data, the value of κ must be set by the user.Even when there are replicated data, if the user knows the value of κ, it may be introduced and it will be taken into consideration to perfom the conditional test.When κ is unknown and there are replicated data, the parameter is internally estimated by maximum likelihood and κ is shown in the output.The biasCorrect is related to the estimation of κ.If biasCorrect=TRUE the bias correction appearing in Mardia and Jupp (2000) p. 87 is performed in the estimation of κ.The output of this function is an object of class isocir (explained below) with all the results from the conditional test: the CIRE ( φ), the unrestricted circular means (θ), the SCE (SCE( φ, θ)), the kappa value (estimated or introduced) and the p-value of the conditional test.

• Class isocir
Finally we describe the isocir function.This function creates the S3 objects of class isocir which is a list with the following elements: $cirmeans is a list with the unrestricted circular means.Notice that when the argument data is a vector, these values match exactly with the input.However, if there are replicated data, the argument data is a matrix and $cirmeans contains the corresponding unrestricted circular means (θ 1 , . . ., θ q ).
$SCE is the value of the Sum of Circular Errors (SCE( φ, θ)).
$CIRE is a list with the Circular Isotonic Regression Estimator ( φ) obtained under the order defined by the groups argument.
$kappa the value of kappa (either set by the user or estimated).
$pvalue the p-value of the conditional test obtained from the function cond.test.
These objects of class isocir are the output of the functions CIRE and cond.test.The last two elements of the list ($kappa and $pvalue) are NULL if the object comes from the function CIRE.Otherwise, if the object comes from the function cond.testnot only there are the results of the conditional test ($kappa and $pvalue) but also an attribute called estkappa will inform (or rather remind) the user if the value in $kappa has been internally estimated or introduced as a known input.Some S3 methods have also been defined for the class isocir: • isocir(cirmeans = NULL, SCE = NULL, CIRE = NULL, pvalue = NULL, kappa = NULL): This function creates an object of class isocir.
• is.isocir(x):This function checks whether the object x is of class isocir.
• print.isocir(x,decCIRE, decpvalue, deckappa, ...): This S3 method is used to print an object x of class isocir.The number of decimal places can be chosen.
• plot.isocir(x,option = c("CIRE", "cirmeans"), ...): This S3 method is used to plot an object x of class isocir.The argument option gives the user the option to plot the points of the Circular Isotonic Regression Estimator (by default) or the unrestricted circular means.

Examples
In this section we provide examples to illustrate isocir.
We illustrate isocir for estimating the 8 population angular parameters under the following partial circular order constraint: These data are a set of random circular data called cirdata in our package and they can be used by calling as below:

> data(cirdata) > cirdata
Since in this example, there are no replications, we provide data in a vector format.The groups of the order are defined as follows: > orderGroups <-c(1, 1, 1, 2, 2, 3, 4, 4) Thus we obtain CIRE using the function CIRE as follows:

> example1CIRE <-CIRE(cirdata, groups = orderGroups, circular = TRUE)
The output is saved in example1CIRE and the printed output is as follows:

> example1CIRE
Thus the constrained estimates satisfy the required order as follows: where φ is the Circular Isotonic Regression Estimator of φ. Results may be displayed graphically by setting plot(example1CIRE).When done so, a plot with the points of the CIRE is produced.To see the plot for the unrestricted estimates the argument option="cirmeans" can be used (i.e.plot(example1CIRE, option = "cirmeans")).

Example 2 (κ unknown (replications needed))
Using the data in our package called datareplic we demonstrate the use of the function cond.test when κ is unknown.As remarked earlier, when κ is unknown we need replicate data to estimate κ.The file datareplic is a matrix where each column contains the values of a replication and each row the angles observed at each population mean.
In this example we have 8 populations and hypotheses regarding the corresponding 8 parameters are as follows: We take the data from the package and set the groups of the order in the argument groups.

> data(datareplic) > orderGroups2 <-c(1:8)
Since replicate data are available, we do not include the argument kappa as we want the function to estimate it.Moreover, we correct the bias in the estimation of κ, so we set biasCorrect=TRUE.Thus we have the following code: The result is the p-value defined in (5).Since the p-value = 0.0034 we reject the null hypothesis and conclude that the parameters do not satisfy the specified circular order.If the user is interested in printing the unrestricted circular means θ then the following command is used: example2test$cirmeans.The result is a list that is saved in the same format as CIRE.Since each group in the circular order has a single element, it is convenient to use the vector format.Hence we have: > round(unlist(example2test$cirmeans), digits = 3)
Notice that there are no replicated data here, since the experiments were not performed under the same experimental conditions.It appears that the 10 experiments were not synchronized (i.e. cells were probably not arrested at the same point in the cell cycle).For this reason, from Table 5 it appears that there is a large variability in the estimates of phase angles of each of the 16 genes.Even though there may be large variability in the estimated values, our interest is in the relative order of phase angles among the 16 genes which does not rely on the location of the pole and hence does not rely on the synchronization.As there are no replicated data, we have a single observation for each of the 16 fission yeast genes in each experiment and, therefore, the values in Table 5 play the role of the unrestricted circular mean in each experiment.Consequently we suppose that, where θ ij is the unrestricted circular mean of the gene i in the experiment j.
Since the 10 experiments may not considered as replications of each other, we performed a separate test for each experiment.Moreover, as explained in Fernández et al. (2012) we assume that the concentration parameter κ j depends on the experiment but not on the gene.The reason for this is that out of the two sources of uncertainty, one specific to the gene and another one due to the experiment (and therefore common to all genes within the experiment), the former source maybe considered negligible relative to the latter as the number of time points used in each time course experiment is fairly large for any specific gene.The κ j values considered for this example are obtained using Fernández et al. (2012).
The procedure used for the computation of these values comes from an analysis of variance type methodology.Under the assumptions made before, the model for the circular means is: where α i is the gene effect and β j is the experiment effect.The proposed model is analogous to the standard two-way analysis of variance model and is fully detailed in the supplementary material of Fernández et al. (2012).
For each of the 10 experiments, we test the hypothesis (6) using the function cond.test that is in our software.The following code gives the p-values for each experiment.
Results are summarized in Table 5.

Conclusions
In this paper the R package isocir has been presented.This package provides useful tools for drawing inferences from circular data under order restrictions.There are two main functions (CIRE and cond.test).The first one computes the CIRE, the circular version of the widely known isotonic regression in R q .The second one is designed for testing circular hypotheses using a conditional test.We have also created the class isocir in order to properly save all the results.Although we illustrated the proposed methodology using an example from cell biology, the proposed software can be applied to a wide range of contexts.For example, biologists working on circadian clocks may be interested in the testing for the conservation of circular order among circadian genes between two tissues (e.g., Liu et al. (2006)).We would like to emphasize that the field of constrained inference on a unit circle is in its infancy and is wide open for new developments both in methods as well as applications.As observed in the introduction, such constrained inference problems arise naturally in many applications.Therefore we expect the software described in this paper to be widely used by researchers working in such areas.

Table 1 :
Summary of the components of the package isocir

Table 2 .
The input variable data is a matrix where each column contains

Table 2 :
Arguments of the CIRE function the vector of unconstrained angular means corresponding to each replication.

Table 3 :
Table 3 and are explained below.Arguments data Arguments of the cond.testfunction