EMMIX-uskew: An R Package for Fitting Mixtures of Multivariate Skew t-distributions via the EM Algorithm

This paper describes an algorithm for fitting finite mixtures of unrestricted Multivariate Skew t (FM-uMST) distributions. The package EMMIX-uskew implements a closed-form expectation-maximization (EM) algorithm for computing the maximum likelihood (ML) estimates of the parameters for the (unrestricted) FM-MST model in R. EMMIX-uskew also supports visualization of fitted contours in two and three dimensions, and random sample generation from a specified FM-uMST distribution. Finite mixtures of skew t-distributions have proven to be useful in modelling heterogeneous data with asymmetric and heavy tail behaviour, for example, datasets from flow cytometry. In recent years, various versions of mixtures with multivariate skew t (MST) distributions have been proposed. However, these models adopted some restricted characterizations of the component MST distributions so that the E-step of the EM algorithm can be evaluated in closed form. This paper focuses on mixtures with unrestricted MST components, and describes an iterative algorithm for the computation of the ML estimates of its model parameters. The usefulness of the proposed algorithm is demonstrated in three applications to real data sets. The first example illustrates the use of the main function fmmst in the package by fitting a MST distribution to a bivariate unimodal flow cytometric sample. The second example fits a mixture of MST distributions to the Australian Institute of Sport (AIS) data, and demonstrate that EMMIX-uskew can provide better clustering results than mixtures with restricted MST components. In the third example, EMMIX-uskew is applied to classify cells in a trivariate flow cytometric dataset. Comparisons with other available methods suggests that the EMMIX-uskew result achieved a lower misclassification rate with respect to the labels given by benchmark gating analysis.


Introduction
In many practical problems, data are often skewed, heterogeneous, and/or contain outliers.Finite mixture distributions of skewed distributions have become increasingly popular in modelling and analyzing such data.This use of finite mixture distributions to model heterogeneous data has undergone intensive development in the past decades, as witnessed by the numerous applications in various scientific fields such as bioinformatics, cluster analysis, genetics, information processing, medicine, and pattern recognition.For a comprehensive survey on mixture models and their applications see, for example, the monographs by Everitt and Hand (1981), Titterington et al. (1985), McLachlan and Basford (1988), Lindsay (1995), Böhning (2000), McLachlan and Peel (2000), and Frühwirth-Schnatter (2006), the edited volume of Mengersen et al. (2011), and also the papers by Banfield and Raftery (1993) and Fraley and Raftery (1999).
In recent years, finite mixtures of skew t-distributions have been exploited as an effective tool in modelling high-dimensional multimodal and asymmetric datasets; see, for example, Pyne et al. (2009a) and Frühwirth-Schnatter and Pyne (2010).Following the introduction of the skew normal (SN) distribution by Azzalini (1985), several authors have studied skewed extensions of the t-distribution.Finite mixture models with multivariate skew t (MST) components was first proposed by Pyne et al. (2009a) in a study of an automated approach to the analysis of flow cytometry data.Wang et al. (2009) has given a package EMMIXskew for the implementation in R (R Development Team 2011) of their algorithm.More recently, Basso et al. (2010) studied a class of mixture models where the components densities are scale mixtures of univariate skew normal distributions, known as the skew normal/independent (SNI) family of distributions, which include the (univariate) skew normal and skew t-distributions as special cases.This work was later extended to the multivariate case in Cabral et al. (2012), and was implemented in an R package mixsmsn.However, in these characterizations, restrictions were imposed on the component skew t-distributions in order to obtain manageable analytical expressions for the conditional expectations involved in the E-step of the EM algorithm.These versions of the skew t-distributions are known as the 'restricted' form of the MST distribution; see Lee and McLachlan (2012a) for further discussion on this.
In this paper, we present an algorithm for the fitting of the unrestricted skew t-mixture model.We show that an EM algorithm can be implemented exactly without restricting the characterizations of the component MST distributions.Closed form expressions can be obtained for the E-step conditional expectations by recognizing that they can be formulated as moments of a multivariate non-central truncated t-variate, which can be further expressed in terms of central t-distributions.The algorithm is implemented in R in the package EMMIXuskew, available at http://www.maths.uq.edu.au/~gjm/mix_soft/EMMIX-skew.
The package EMMIX-uskew consists of three main functions: fmmst, rfmmst, and contour.fmmst.The main function fmmst fits a mixture of unrestricted MST (uMST) distributions using an EM algorithm described in Section 3. The function rfmmst generates random samples from mixtures of uMST distributions.For a user friendly visualisation of the fitted models, fmmst.contourprovides 2D contour maps of the fitted bivariate densities and 3D displays with interactive viewpoint navigation facility for trivariate densities.
The remainder of this paper is organized as follows.Section 2 provides a brief description of the uMST distribution and defines the FM-uMST model.Section 3 presents an EM algorithm for fitting the FM-uMST model.In the next section, an explanation of how to fit, visualize, and interpret the FM-uMST models using EMMIX-uskew is presented.The usage of EMMIXuskew is illustrated with three applications and comparisons are made with some restricted FM-MST models and other clustering methods.Finally, we conclude with a brief summary of our results.

Finite mixtures of multivariate skew t-distributions
We begin by defining the (unrestricted) multivariate skew t-density.Let Y be a p-dimensional random vector.Then Y is said to follow a p-dimensional unrestricted skew t-distribution (Sahu et al. 2003) with p × 1 location vector µ, p × p scale matrix Σ, p × 1 skewness vector δ, and (scalar) degrees of freedom ν, if its probability density function (pdf) is given by where Here the operator diag(δ) denotes a diagonal matrix with diagonal elements specified by the vector δ.Also, we let t p,ν (.; µ, Σ) be the p-dimensional t-distribution with location vector µ, scale matrix Σ, and degrees of freedom ν, and T p,ν (.; µ, Σ) the corresponding (cumulative) distribution function.The notation Y ∼ uMST p,ν (µ, Σ, δ) will be used.Note that when δ = 0, (1) reduces to the symmetric t-density t p,ν (y; µ, Σ).Also, when ν → ∞, we obtain the (unrestricted) skew normal distribution.
Various versions of the multivariate skew t-density have been proposed in recent years.It is worth noting that the versions considered by Azzalini and Capitanio (2003), Gupta (2003), and Lachos et al. (2010), among others, are different from (1).These versions are simpler in that the skew t-density is defined in terms involving only the univariate t-distribution function instead of the multivariate form of the latter as used in (1).These simplified characterizations have the advantage of having closed form expressions for the conditional expectations that have to be calculated on the E-step.The reader is referred to Lee and McLachlan (2012a,b) for a discussion on different forms of skew t-distributions.We shall adopt the unrestricted form (1) of the MST distribution here as proposed by Sahu et al. (2003), and describe a computationally efficient EM algorithm for fitting this model.
A g-component finite mixture of uMST distributions has density given by where f p (y; µ h , Σ h , δ h , ν h ) denotes the hth uMST component of the mixture model as defined by (1), with location parameter µ h , scale matrix Σ h , skew parameter δ h , and degrees of freedom ν h .The mixing proportions π h satisfy π h ≥ 0 (h = 1, . . ., g) and g h=1 π h = 1.We shall denote the model defined by (2) by the FM-uMST (finite mixture of uMST) distributions.Let Ψ contain all the unknown parameters of the FM-uMST model; that is, Ψ = π 1 , . . ., π g−1 , θ T 1 , . . ., θ T g T where now θ h consists of the unknown parameters of the hth component density function.The density values for a uMST and FM-uMST distribution can be evaluated using the functions dmst and dfmmst in EMMIX-uskew.
Random samples of uMST variates can be generated by adopting a stochastic representation of (1) (Lin 2010).If Y ∼ uM ST p,ν (µ, Σ, δ), then where the random variables are independent, and gamma(α, β) denotes the gamma distribution with shape and scale parameters given by α and β respectively.Sampling of uMST and FM-uMST variates are implemented in EMMIX-uskew in the rmst and rfmmst functions, respectively.

The EMMIX-uskew algorithm
From (3) to (6), the uMST distribution admits a convenient hierarchical characterization that facilitates the computation of the maximum likelihood estimator (MLE) of the unknown model parameters using the EM algorithm, namely, where ∆ h = diag (δ h ), HN p (µ, Σ) denotes the p-dimensional half-normal distribution with location parameter µ and scale matrix Σ.

Fitting of FM-uMST model via the EM algorithm
Let Y 1 , • • • , Y n be n independent observations of Y .To formulate the estimation of the unknown parameters as an incomplete-data problem in the EM framework, we introduce a set of latent component labels z j = (z 1j , . . ., z gj ) (j = 1, . . ., n) in addition to the unobservable variables u j and w j , where each element z hj is a zero-one indicator variable with z hj = 1 if y j belongs to the hth component, and zero otherwise.Thus, g h=1 z hj = 1 (j = 1, . . ., n).
It follows that the random vector Z j corresponding to z j follows a multinomial distribution with one trial and cell probabilities π 1 , . . ., π g ; that is, Z j ∼ Mult g (1; π 1 , . . ., π g ).
The complete-data log likelihood function can be factored into the marginal densities of the z j , the conditional densities of the w j given z j , and the conditional densities of the y j given u j , w j , and z j .Accordingly, the complete-data log likelihood is given by where and where Here Ψ contains all the unknown parameters of the FM-uMST model.
The implementation of the EM algorithm requires alternating repeatedly the E-and M-steps until convergence in the case where the changes in the log likelihood values are less than some specified small value.The E-step calculates the expectation of the complete-data log likelihood given the observed data y using the current estimate of the parameters, known as the Q-function, given by The M-step then maximizes the Q-function with respect to the parameters Ψ.
On the (k + 1)th iteration, the E-step requires the calculation of the conditional expectations The conditional expectation of Z hj given the observed data, is given, using Bayes' Theorem, by which can be interpreted as the posterior probability of membership of the hth component by y j , using the current estimate Ψ (k) for Ψ.
It can be shown that the conditional expectations e 2,j , and e 3,j are given by and where X is a p-dimensional t-variate truncated to the positive hyperplane R + , which is distributed as hj , ν where tt p,ν (µ, Σ; R + ) denotes the positively truncated t-distribution with location vector µ, scale matrix Σ, and ν degrees of freedom.The truncated moments E(X) and E(XX T ) can be swiftly evaluated by noting that they can be expressed in terms of the distribution function of a (non-truncated) multivariate central t-random vector; Lee andMcLachlan (2011, 2012a).Recently, Ho et al. (2012) have considered the moments of of the doubly truncated multivariate t-distribution, but their result corresponding to ( 16) is incorrect; see Lee and McLachlan (2012a) for further details.
The (k + 1)th M-step consists of maximization of the Q-function with respect to Ψ.It follows that an updated estimate of the unknown parameters of the FM-uMST model is given by hj e hj e hj e and where ⊙ denotes element-wise matrix product.Note that (18) and also ( 16) are given incorrectly in Lee and McLachlan (2011).
An update ν (k+1) h of the degrees of freedom is obtained by solving iteratively the equation log ν is the Digamma function.This last equation has been simplified by making use of a one-step-late approximation (Green 1990) in updating the estimate of ν h .As a consequence, it can affect the monotonicity of the likelihood function.Our experience suggests that this rarely happens.The monotonicity of the likelihood can be preserved by working with the exact expression as given by Equation ( 73) in Lee and McLachlan (2012a).
There is an option in the program to use this more time consuming updating of ν h .The algorithm described in this Section is implemented as the fmmst function in EMMIX-uskew.

Choosing initial values
It is important to obtain suitable initial values in order for fmmst to converge quickly.In EMMIX-uskew, starting values for the model parameters are based on an initial clustering given by k-means.Twenty attempts of k-means are performed, and the starting component labels z (0) j (j = 1, . . ., n) are initialized according to the clustering result with the highest relative log likelihood (see Lee and McLachlan (2012a)).The other parameters are initialized as follows: where S h is the sample covariance of the hth component, and where γ h is the sample skewness of the hth component, whose ith element is given by and where y ij denotes the ith element of the jth observation, and µ i is the ith element of µ.
Here, s h denotes the vector created by extracting the main diagonal of S h , and the vector s * h is created by taking the square root of each element in s h .The scalar a is varied systematically across the interval (0, 1) to search for a (relatively) optimal set of starting values for the model parameters.

Stopping rule
EMMIX-uskew adopts a traditional stopping criterion which is based on the absolute change in the size of the log likelihood.An Aitken acceleration-based strategy is described in Lin (2010).The algorithm is terminated when the absolute difference between the log likelihood value and the asymptotic log likelihood value is less than a sepcified tolerance, ǫ, that is where is the asymptotic estimate of the log likelihood at the (k + 1)th iteration, given by L , and is the Aitken's acceleration at the kth iteration.The default tolerance is ǫ = 10 −3 , but the user can specify a different value.

Using the EMMIX-uskew package
The parameters of the FM-uMST model in EMMIX-uskew are specified as a list structure containing the elements described in Table 1.The parameters µ, Σ, and δ are each implemented as a list of g matrices, where g is the number of components in the fitted model.For example, mu [[2]] is a p × 1 matrix representing µ 2 .Each sigma [[h]] (h = 1, . . ., g) is a p × p matrix representing the symmetric positive definite scale matrix of the hth component.The parameters dof and pro are g by 1 arrays, representing the vector of degrees of freedom and the vector of mixing proportions for each component, respectively.The probability density function of a multivariate skew t-distribution is calculated by the dmst function.The parameter dat is an n × p matrix, containing the coordinates of the n point(s) at which the density is to be evaluated.The following command will return a vector of n density values.

Fitting a single multivariate skew t-distribution
To fit a specified FM-uMST model, the core function in EMMIX-uskew, fmmst, is used.This implements the algorithm described in Section 3. A typical function call of fmmst is: fmmst(g, dat, initial=NULL, known=NULL, itmax=100, eps=1e-3, nkmeans=20, print=TRUE) The main arguments used within this function are: • g: a scalar that specifies the number of uMST components to be fitted.
• dat: an n × p matrix containing the data.
• initial: a list that specifies the initial values used to start the algorithm.
• known: a list that specifies any model parameters that are known and so not required to be estimated.
• itmax: a scalar that specifies the maximum number of iterations to be used.
• eps: a scalar that specifies the termination criterion of the EM algorithm loop.
• nkmeans: an integer that specify the number of k-means trials to be used to select the best set of initial values.
Note that if the initial values of the model parameters are provided by the user, the argument initial is expected to be structured as described in Table 1.Similarly, known is expected to have the same structure.When initial=NULL, fmmst will generate a set of initial values using the procedure described in Section 3.2.Any parameters specified in known are taken as known parameters and hence are not estimated by fmmst.There is no need to specify the values of all the parameters in initial and known when only some of the parameters are known.Parameters that are not specified in the function call are estimated by fmmst.By default, fmmst performs 20 k-means attempts when searching for the best initial value.The user can specify a different value using nkmeans.The termination criterion for the EMMIXuskew algorithm is controlled by the parameters itmax and eps.The EM loop terminates when either one of the two criterion is satisfied, whichever occurs first: (a) the EM loop reaches itmax iterations (default is 100 iterations), or (b) the absolute difference between the current log likelihood value and that the asymptotic log lileklihood value is smaller than eps (default is 1e-3).The last argument of fmmst is print.When the option print is set to TRUE (default), fmmst prints the log likelihood value at each iteration and displays a summary of the parameters of the fitted model after termination.To turn off the print mode, simply set print=FALSE.For further details of the arguments of fmmst, the reader is referred to the documentation of fmmst.This can be accessed by typing ?fmmst at the R command prompt.
We consider now the T-cell phosphorylation dataset (Maier et al. 2007) as an example of asymmetrically distributed data, available from Pyne et al. (2009b).The data contain measurements of blood samples stained with four antibodies, CD4, CD45RA, SLP76, and ZAP70.
For illustration, we randomly select 500 observations and focus on two of the variables, CD4 and ZAP70.To fit a MST model to this bivariate Lymphoma dataset, under the default settings, the following command is issued: A summary of the output of the fitted model can be obtained using the summary function.
This prints the values of the fitted model parameters for each component.For a fitted uMST model, the weighting proportion (which is 1) is not printed.The following output shows a typical summary of a fitted single component uMST model.

Degrees of freedom: 5.851434
To view a more detailed output of the fmmst function, the print function is called.This outputs a list containing 11 elements.The first five elements give the estimates of the parameters of the fitted FM-uMST model, as described in Table 1.
The posterior probability of component membership is given by the output argument tau, a g × n matrix where the rows corresponds to the component number.The final partition of each data point, based on tau, is stored as clusters.The value of the log likelihood function, evaluated with the current parameter estimates, is given by loglik.The last two arguments aic and bic are the values of the Akaike information criterion (AIC) and the Bayes information criterion (BIC), respectively.The following output shows an excerpt from the second part of the print output of the fitted model.
As mentioned previously, initial values for the EM algorithm can be specified by the user.Suppose an initial guess of µ for the above example is (5, 6) T , then one can specify µ (0) to be (5, 6) T by issuing the command: <-list(c(5, 6)) R> fmmst (1,LymphoSample,initial=obj) This will start the EM algorithm with the specified value for µ (0) , and the other parameters using (20).The user can further demand more k-means trials to be performed by increasing nkmeans, for example, to 50 trials.This can be achieved by issuing the following command.

Fitting mixtures of multivariate skew t-distributions
This section presents an illustration of fitting a mixture of unrestricted skew t-distributions to some bivariate bimodal asymmetric data.We consider the Australian Institute of Sport (AIS) data from Cook and Weisberg (1994), where thirteen body measurements on 102 male and 100 female athletes were recorded.In this example, we consider the clustering of the data with a two component skew t-mixture model based on the two variables Height and Body fat.By setting print=TRUE, we can examine the value of the log likelihood function at each iteration.ais[,c(2,12) We compare the results with two other model-based clustering methods provided by the package mixsmsn (Prates et al. 2011) and EMMIX-skew (Wang et al. 2009).As mentioned previously, this two models are based on mixture of restricted versions of the multivariate skew t-distributions.The first model adopts the skew normal/independent skew t-distribution (Cabral et al. 2012) as its component densities, which is equivalent to the restricted skew tdistribution (Pyne et al. 2009a) used in the second model.However, it should be noted that, in the ECME algorithm implemented in the package mixsmsn, the component degrees of freedom are constrained to be the same.A comparison of the table of cluster labels (permuted where necessary to minimize the number of misallocations) with the true class labels (given by ais$Sex in this example; M for male, F for female) reveals that the FM-uMST model has a higher number of correct allocations (183 compared to 162 and 157 given by mixsmsn and EMMIX-skew, respectively).Thus, the unrestricted FM-uMST model in EMMIX-uskew gives a more accurate clustering in this case.<-smsn.mmix(ais[c(2,12)],g=2, family="Skew.t",group=TRUE) R> 12)], 2, "mst", print=FALSE) R> table (ais$Sex,Fit3$group) 1 2 M 91 11 F 29 71 R> table (ais$Sex,Fit4$clust) 1 2 M 89 13 F 32 68 R> table (ais$Sex,Fit2$clusters) 1 2 M 97 5 F 14 86 4.4.Testing for the significance of the skewness parameter When we set δ = 0 in (1), we obtain the multivariate t-density.The function fmmt (g,dat,initial=NULL,known=NULL,itmax=100,nkmeans=20,print=TRUE) implements the EM algorithm for fitting finite mixtures of multivariate t (FM-MT) distributions (McLachlan and Peel 2000).

R> library("mixsmsn") R> Fit3
To test whether the skewness parameter in the FM-uMST model is significant, one can construct a likelihood ratio test for the null hypothesis H 0 : δ 1 = . . .= δ g = 0 versus the alternative hypothesis where at least one of δ h (h = 1, . . ., g) is different from 0. This leads to the test statistic where L t and L st denote the log likelihood value associated with the FM-MT model and the FM-uMST model, respectively.It follows that the test statistics is asymptotically distributed as χ 2 r , where r is the difference between the number of parameters under the alternative and null hypotheses.This test is implemented in the function delta.test(stmodel=NULL, tmodel=NULL, stloglik, tloglik, r), where the first two arguments are the output from fmmst and fmmt respectively.Alternatively, the user can provide the log likelihood values of the two models and the value of r directly by specifying the last three arguments of delta.test().The output of the function is the P -value of the test.Consider again the AIS example in Section 4.6.If we examine the cluster labels given by the FM-MT model, we can see that it yields a noticeably higher number of misallocations than the skew t-mixture model.A test for δ = 0 can be performed by issuing the following commands.In this case, the small P -value suggests there is strong evidence that the skewness parameter in the FM-uMST fit is significantly different from zero.ais[,c(2,12)]) R> table (ais$Sex,Fit5$clusters) 1 2 M 77 23 F 14 88 R> delta.test(Fit2,Fit5)0.0003798128

Discriminant analysis
Discriminant analysis based on a specified FM-uMST model can be performed using the fmmstDA function.
fmmstDA (g, dat, model) The data in dat are assigned to the cluster corresponding to the component of the FM-uMST model with the highest posterior probability.Specifications of the model parameters must be provided in model, which is typically an output from fmmst.Optionally, model can be specified by the user as a list of at least six elements: the five model parameters, and a vector of cluster labels clusters.The following commands shows an example using fmmstDA.A random sample of FM-uMST variables is generated from rfmmst, the first part of which is used as training set, and the second is a testing set.The FM-uMST model fitted to the training set is then used for classifying the data in the testing set.
The grid size is determined by grid.By default, the data points are included in the plot.If only the contour are required, the option drawpoints=FALSE should be set.When including the points in a plot, clusters specifies the component labels of each point according to which the data points will be coloured.The argument levels is either an integer specifying the number of contour lines to be plotted, or a vector of quantile values.For fmmst.contour.3d,only the 90th percentile contour is plotted by default.If more contours are required, the argument levels should be a vector of the required quantiles.For example, if a plot of the 25 th , 50 th , and 75 th percentiles are required, then levels = c(0.25,0.5, 0.75).Bivariate data have the option of being plotted as an intensity map instead of scatter plot.This can be obtained by setting map="heat".There is also an option for plotting a cluster map of a fitted model using the option map="cluster".Plots for specific components of a mixture model can be requested with the argument component.When component=NULL (which is default), the mixture contour is plotted.When component is a vector with length between 1 and g, the specified components are plotted and the mixing proportion is not taken into account.
The last argument of the fmmst.contourfunctions "..." allows the user to pass additional arguments to the plot function, such as the colour and size of the points.
Figure 1a shows the contour of the fitted MST model to the Lymphoma data.Here a heatmap of the original data is used.This plot can be generated via the command, R> fmmst.contour.2d(Lympho,model=Fit, map="heat", xlab="SLP76", ylab="ZAP70") The default fmmst.contour.2dfunction will return a scatter plot of the data in 2D superimposed with the contours of the fitted mixture model.For example, the following command generates a contour plot of the fitted FM-uMST model to the ais data in Section 4.2 (Figure 1b).Note that fmmst.contour.2dcoloured the sample points according to the clustering given by the argument clusters: R> fmmst.contour.2d(ais[,c(2,12)],model=Fit2, clusters=label, xlab="Ht", ylab="Bfat") Suppose we are interested in visualizing a clustering map of the fitted model to the simulated data in Section 4.5.This plot can be generated by issuing the following command.
To demonstrate the use of fmmst.contour.3d,we consider the clustering of a trivariate Diffuse Large B-cell Lymphoma (DLBCL) dataset provided by the British Columbia Cancer Agency.The data contain fluorescent intensities of multiple conjugated antibodies (known as markers) stained on a sample of over 8000 cells derived from the lymph nodes of patients diagnosed with DLBCL.In flow cytometric analysis, these parallel measurements of fluorescent intensities can be used to study the differential expression of different surface and intracellular proteins of a given blood sample.The analysis typically involves the identification of cell populations from the multidimensional dataset, currently performed manually by visually separating regions (gates) of interests on a series of sequential bivariate projections of the data, a process known as gating.Due to the subjective and time-consuming nature of this approach, and the difficulty in detecting higher-dimensional inter-marker relationships, many efforts have been made to develop computational methods to automate the gating process.
The DLBCL samples here were stained with three markers CD3, CD5, and CD19.The task is to automatically gate the cells by clustering the data into four groups.Hence we fit a fourcomponent FM-uMST model to the data.The maximum number of iterations was increased to 300.
A scatterplot of the data is shown in Figure 2, where the dots are coloured according to the clustering provided by human experts, which are considered as the 'true' class labels.
Figure 2b shows the 95 th percentile density contours of the four components of the fitted model which are displayed with matching colours.The 3D plot uses the rgl visualization device system, and hence supports user friendly interactive navigation.The plots can be rotated in real-time to select a suitable viewpoint.The following code can be used to generate the 3D plots in Figure 2.
R> Fit6 <-fmmst (DLBCL,4,itmax=300) R> fmmst.contour.3d(DLBCL,model=Fit4,level=0.9,drawpoints=FALSE,xlab="FL1.LOG",ylab="FL2.LOG",zlab="FL4.LOG",quantile=0.95)The effectiveness of a clustering can be obtained by comparing its error rate with the cluster labels from manual expert gating taken to be the true class labels.This error rate is calculated for each permutation of the cluster labels of the clustering result under consideration and the rate reported is the minimum value over all such permutations.Note that dead cells were removed before evaluating the error rate against the benchmark results.For comparison, we calculated the error rate associated with the clustering results given by three other methods -FLAME (Pyne et al. 2009a), flowClust (Lo et al. 2008) and flowMeans (Aghaeepour et al. 2011).From Table 2, the FM-uMST model clearly shows superior performances in this dataset.

Concluding remarks
We have presented the R package EMMIX-uskew for fitting finite mixtures of unrestricted multivariate skew t-distributions to heterogeneous asymmetric data.The package implements a closed-form EM algorithm for fitting FM-uMST models and provides user-friendly visualization of the fitted contours in 2D and 3D.The major features of the software have been demonstrated on three real examples on the T-cell phosphorylation data, the Australian Institute of Sports (AIS) data, and the DLBCL dataset.The clustering results were compared to those obtained via mixtures of restricted multivariate skew t-distributions and other methods.In both the AIS and DLBCL illustrations, the unrestricted model gave better clustering results with respect to the true class labels.It should be noted that the fitting of the unrestricted skew t-mixture model can be quite slow in higher dimensional applications, due to the computationally intensive procedure involved in the calculation of multivariate t-distribution function values.The algorithm would benefit from further research on applicable acceleration techniques, for example, the implementation of the SQUAREM strategy (Varadhan and Roland 2008).

Figure 1 :
Figure 1: 2D contour plots generated by the fmmst.contour.2dfunction.(a) The fitted contour of the single component uMST model plotted over the hue intensity diagram of the Lymphoma dataset; (b) the default mixture contour plot of the fitted two-component FM-uMST model of the AIS dataset; (c) the contour of the individual components of the three-component model fitted to a bivariate synthetic sample plotted over the cluster map of the sample.

Figure 2 :
Figure 2: 3D contours plot of the DLBCL dataset generated by the fmmst.contour.3dfunction.(a) A scatterplot of the data coloured according to the the true clustering labels of the DLBCL dataset; (b) fitted contour of the three component FM-uMST model for the DLBCL dataset.

Table 1 :
Structure of the model parameters in EMMIX-uskew.
An output of the rfmmst function consists of p + 1 columns.The first p columns are the coordinates of the generated sample.The last column indicates from which component each data point is generated.Executing the above command will generate an output similar to the following: ], print=TRUE)

Table 2 :
Error rate of misclassification of four methods for the DLBCL dataset.