A comparison of algorithms for fitting the PARAFAC model

https://doi.org/10.1016/j.csda.2004.11.013Get rights and content

Abstract

A multitude of algorithms have been developed to fit a trilinear PARAFAC model to a three-way array. Limits and advantages of some of the available methods (i.e. GRAM-DTLD, PARAFAC-ALS, ASD, SWATLD, PMF3 and dGN) are compared. The algorithms are explained in general terms together with two approaches to accelerate them: line search and compression. In order to compare the different methods, 720 sets of artificial data were generated with varying level and type of noise, collinearity of the factors and rank. Two PARAFAC models were fitted on each data set: the first having the correct number of factors F and the second with F+1 components (the objective being to assess the sensitivity of the different approaches to the over-factoring problem, i.e. when the number of extracted components exceeds the rank of the array). The algorithms have also been tested on two real data sets of fluorescence measurements, again by extracting both the right and an exceeding number of factors. The evaluations are based on: number of iterations necessary to reach convergence, time consumption, quality of the solution and amount of resources required for the calculations (primarily memory).

Introduction

The PARAFAC (PARallel FACtor analysis) model was introduced in 1970 by Harshman (1970) and simultaneously by Carroll and Chang (1970) under the name CANDECOMP. For a three-way data array X̲, the PARAFAC model is defined asxijk=f=1Faifbjfckf+rijk,i=1I,j=1J,k=1K,where xijk is the measured value, aif, bjf, and ckf represent the parameters to estimate, rijk are the residuals and F is the number of factors extracted.

In general terms, fitting model (1) boils down to minimising (usually in a least squares sense) the fitting error made. This means finding the parameters a11,a12,,cKF that minimise the loss function:La11,a12,,cKF=i=1Ij=1Jk=1Kxijk-f=1Faifbjfckf2.Several algorithms for solving such problems are described in the literature (Faber et al., 2003, Hayashi and Hayashi, 1982, Paatero, 1997, Tomasi and Bro, 2004). Some of those tested in this work (DTLD, ASD and SWATLD) do not minimise (2) and in fact their loss functions are not strictly well-defined (Faber et al., 2003).

While the PARAFAC model remains the same, the different methods of fitting it to a three-way array X̲ can be better explained introducing distinct notations. The parameters can be gathered in three loading matrices A (also referred to as scores matrix), B and C defined asA=aifi=1I,f=1F=a1a2aF,B=bjfj=1J,f=1F=b1b2bFandC=ckfk=1K,f=1F=c1c2cF,where af, bf and cf denote the columns of A, B and C respectively.

Using the matricised format for multi-way arrays (Bro, 1998) and with the introduction of the column-wise Khatri–Rao product (Rao and Mitra, 1971) (see Appendix C), the model can be written asX(I×JK)=ACBT+R(I×JK),where “T” means the transpose and (I×JK) refers to the way the multi-way array is matricised (K slabs of size I×J are put one beside the other forming a matrix of dimension I×JK).

Another possibility is to express the model in a slab-wise form as:X..k=ADkBT+R..k,where Dk is a diagonal matrix containing the k-th row of C, X..k is the I×J matrix representing the k-th frontal slab and R..k is the corresponding matrix of the residuals. Eqs. (4) and (5) have to be trivially modified in case other matricisations (e.g. J×IK) or slabs (e.g. the horizontal slab Xi.. of size K×J) are required (Bro, 1998).

The algorithms fitting the PARAFAC model can be classified in three main groups: alternating algorithms, which update only a subset of the parameters at each step; derivative-based methods, seeking an update for all the parameters simultaneously by successive approximations; and direct (non-iterative) procedures. To the first group belong PARAFAC-ALS (Alternating Least Squares (Harshman, 1970)), ASD (Alternating Slice-wise Diagonalisation (Jiang et al., 2000)) and SWATLD (Self Weighted Alternating TriLinear Decomposition (Chen et al., 2000)). The second set is represented by PMF3 (Positive Matrix Factorisation for 3-way arrays (Paatero, 1997)) and dGN (damped Gauss–Newton, also known as Levenberg–Marquadt (Paatero, 1997, Tomasi and Bro, 2004)). As for the third grouping of algorithms, the most known implementations are the Generalised Rank Annihilation Method (GRAM (Sanchez and Kowalski, 1986)) and the Direct TriLinear Decomposition method (DTLD), both based on a generalized eigenvalue problem. All these algorithms will only be described in general terms as more details are available in the original papers.

Fitting the PARAFAC model presents numerous problems. One aspect affecting speed of convergence and retrieval of the underlying solutions is the condition number of the loading matrices, which reflects both collinearity and relative magnitude of the factors (Hopke et al., 1998; Kiers, 1998). Compression, which is outlined in Section 2.5, has been reported to be beneficial in this respect and has the added advantage of reducing the computational expense (Bro and Andersson, 1998; Kiers, 1998). Another problem is the so-called two factor degeneracy (2FD), i.e. the presence in the solution of two factors that are almost perfectly collinear but have opposite signs and almost cancel out each others’ contribution. The essential problem with 2FDs is that the degenerate factors can grow arbitrarily large while the loss function continuously (and very slowly) decreases. 2FDs may appear in the final solution as a consequence of e.g. a wrong estimation of the rank of the array, or some aspects specific to the data set at hand, (Kruskal et al., 1989, Paatero, 2000). A connection has been established between some of the possible causes of 2FDs in the final solution and the so-called swamps (Mitchell and Burdick, 1994; Rayens and Mitchell, 1997), which are sequences of iterations where the interim solutions contain features similar to 2FDs of increasing severity and the loss function decreases very slowly. When an (iterative) algorithm encounters a swamp, it either emerges from it (i.e. after an unpredictably large number of iterations, the loss function starts decreasing more rapidly again and the 2FD slowly disappears) or it reaches an earlier stop because one of the convergence criteria is suddenly met. Great attention has been given in the development of new methods capable of dealing effectively with swamps and more in general with the more frequent high collinearity case. Two examples are regularisation (Paatero, 1997; Rayens and Mitchell, 1997) and line search (Bro, 1998; Harshman, 1970) (described in Section 2.4).

Section snippets

DTLD

Ho et al., 1978, Ho et al., 1980, Ho et al., 1981 developed an algorithm called RAFA (Rank Annihilation Factor Analysis) for estimating the concentration of a chemical analyte in an unknown sample-matrix solely using the measurements of the unknown sample and of a pure standard. This property was coined the second-order advantage, as it is obtained by using the second-order or two-way structure of the individual sample measurements instead of vectorising the corresponding matrix. The

Experimental part

Nine algorithms are studied in this section using both simulated and real data: DTLD-GRAM, PARAFAC-ALS, ASD, SWATLD, dGN, PMF3, PARAFAC-ALS with compression, dGN with compression and PMF3 with compression. For space reasons the single implementations will not be discussed in detail; in general, the MATLAB 6.5 (www.themathworks.com) guidelines for improving performances were followed (e.g. maximising the use of built-in functions). DTLD and PARAFAC-ALS are part of the N-Way toolbox (Andersson

Conclusions

The tests on the nine algorithms showed quite clearly that the advantages of some of the recently proposed alternatives to PARAFAC-ALS were somewhat overestimated.

In particular, ASD appeared consistently inferior to the other iterative methods on both the simulated and the real data sets. Conversely, SWATLD showed a good capability of finding the real underlying components, albeit biased by the fact that it yields solutions with higher core consistency than the least squares one. The distortion

Acknowledgements

The authors are grateful for support from STVF through the ODIN consortium. They would also like to thank N.M. Faber for providing some of the m-files.

References (44)

  • Å. Björck

    Numerical Methods for Least Squares Problems

    (1996)
  • R. Bro

    Multi-way analysis in the food industry

    (1998)
  • R. Bro et al.

    A new efficient method for determining the number of components in PARAFAC models

    J. Chemometrics

    (2003)
  • J.D. Carroll et al.

    Analysis of individual differences in multidimensional scaling via an N-way generalization of Eckart–Young decomposition

    Psychometrika

    (1970)
  • J.D. Carroll et al.

    Candelinc—a general-approach to multidimensional-analysis of many-way arrays with linear constraints on parameters

    Psychometrika

    (1980)
  • Z.P. Chen et al.

    Pseudo alternating least squares algorithm for trilinear decomposition

    J. Chemometrics

    (2001)
  • G.H. Golub et al.

    Matrix Computations

    (1996)
  • Harshman, R.A., 1970. Foundations of the PARAFAC procedure: models and conditions for an ’explanatory’ multi-modal...
  • C. Hayashi et al.

    A new algorithm to solve PARAFAC-model

    Behaviormetrika

    (1982)
  • C.N. Ho et al.

    Application of method of rank annihilation to quantitative-analyses of multicomponent fluorescence data from video fluorometer

    Anal. Chem.

    (1978)
  • C.N. Ho et al.

    Application of the method of rank annihilation to fluorescent multicomponent mixtures of polynuclear aromatic-hydrocarbons

    Anal. Chem.

    (1980)
  • C.N. Ho et al.

    Simultaneous multicomponent rank annihilation and applications to multicomponent fluorescent data acquired by the video fluorometer

    Anal. Chem.

    (1981)
  • Cited by (335)

    View all citing articles on Scopus
    View full text