A General Framework for Multivariate Analysis with Optimal Scaling: The R Package aspect

In a series of papers de Leeuw developed a general framework for multivariate analysis with optimal scaling. The basic idea of optimal scaling is to transform the observed variables (categories) in terms of quantiﬁcations. In the approach presented here the multivariate data are collected into a multivariable. An aspect of a multivariable is a function that is used to measure how well the multivariable satisﬁes some criterion. Ba-sically we can think of two diﬀerent families of aspects which unify many well-known multivariate methods: Correlational aspects based on sums of correlations, eigenvalues and determinants which unify multiple regression, path analysis, correspondence analysis, nonlinear PCA, etc. Non-correlational aspects which linearize bivariate regressions and can be used for SEM preprocessing with categorical data. Additionally, other aspects can be established that do not correspond to classical techniques at all. By means of the R package aspect we provide a uniﬁed majorization-based implementation of this methodology. Using various data examples we will show the ﬂexibility of this approach and how the optimally scaled results can be represented using graphical tools provided by the package.

1. Introduction Gifi (1990) offers a comprehensive collection of nonlinear multivariate methods based on optimal scaling.The starting point of the underlying analysis is a 0-1 dummy matrix based on the data which are considered as categorical.Subsequently, a loss function involving the (unknown) object and category scores is established.During the iterations we stretch/squeeze the variables and compute category scores such that they are optimal in the sense of a minimal loss function.This procedure is referred to as optimal scaling.The simplest method is homogeneity analysis which is also known as multiple correspondence analysis.By imposing restrictions on the quantification ranks we get nonlinear principal component analysis and by defining sets nonlinear multi-set canonical correlation analysis.More detailed descriptions with a strong computational background and, correspondingly, the presentation of the R (R Development Core Team 2009) package homals are given in de Leeuw and Mair (2007a).
In this paper, which is also part of our PsychoR project (http://r-forge.r-project.org/projects/psychor/), we somewhat extend the Gifi approach in terms of a general system for multivariate analysis which we denote as the aspect framework.The theoretical foundation of this approach was already given by de Leeuw (1988b), some of it also in de Leeuw (1982), and further discussed in de Leeuw (1988a), de Leeuw (1993), and de Leeuw, Michailidis, and Wang (1999).Essentially, we can subdivide this framework into two parts: optimization of functions of the correlation matrix on the one hand, and optimization of non-correlational loss functions on the other hand.In the first case we speak of correlational aspects.In the second case, where we mainly focus on linearizing regressions (LINEALS), we speak of non-correlational aspects.Eventually we get optimally scaled unidimensional category quantifications (scores).These scores and the corresponding correlation matrix, respectively, can be used for further multivariate modeling.
As de Leeuw et al. (1999) point out, in many situations (e.g., social and behavioral sciences) we do not know exactly how to express the variables.For instance, we can think of regression examples where we can apply logarithmic, exponential, or square root transformations to the data.In the Gifi (and in our aspect) framework, the transformations are unknown: We scale (i.e., transform or quantify) the categories with respect to a certain criterion.In the particular aspect approach we select an target function, investigate how this function varies over all feasible transformations of the variables, and, finally, quantify the categories accordingly.
In the next sections we start with some basic definitions, present correlational aspects and their optimization by means of majorization.Then, as a particular non-correlational aspect, we focus on LINEALS, discuss implications of this approach and focus on LINEALS as a structural equation models (SEM) preprocessing tool for categorical data.The applied part of the paper focuses on the R package aspect by means of various examples.

Cones in n-space: The theoretical framework
Let us assume that our data are collected in a n × m matrix H = (h 1 , h 2 , . . ., h m ) with i = 1, . . .n and j = 1, . . ., m. From a general perspective, multivariate analysis involves computations on m random variables H 1 , H 2 , . . ., H m , which we can collect into a so called multivariable.An aspect φ of a multivariable is a well-defined function that is used to measure how well the multivariable satisfies some desirable criterion (de Leeuw et al. 1999, p. 529).At this point we can think of many different criteria; thus, many different aspects can be defined.
So far we did not mention that our approach will rescale the categories of each h j by means of optimal scaling.A convenient mathematical formulation is offered by the concept of cones.Note that we are not only interested in the variables as we observe them but rather in transformations of the observed data (monotonic, polynomial, "splinical").If one such transformation or re-expression of the variable suits our criterion better than the original expression, we use the transformed version (de Leeuw 1988a).So let us assume that the observed h j and its transformations are in a known closed convex cone K j , which is a subset of R n .Hence, in our multivariate setting, we have the cones K 1 , K 2 , . . .K m .Various cones are important: K can be a polyhedral convex cone of monotone transformations, a linear subspace of low-order polynomial transformations, a linear subspace for imputation of missing data, or a whole space for latent variables.A thorough historical discussion of the cone concept in connection with well-known multivariate techniques can be found in de Leeuw (1988b).

Dummy matrix and category scores: Practical issues
As mentioned above, we emphasize optimal scaling of categorical variables where each variable H j has number of categories k j .For each variable we define an indicator or dummy matrix G j of dimension n × k j .Within each row i we have one non-zero element indicating in which category of variable j observation i falls.These indicator matrices can be collected in an indicator supermatrix The contingency table of variables j and l is C jl = G j G l .Similar to G we can define the K × K supermatrix C which is known as the Burt matrix.These notations bring us closer to correspondence analysis (CA) which, within the context of our PsychoR project, is described in de Leeuw and Mair (2007b).We think of CA and related methods mainly in an analytical way, as opposed to the (more common) geometrical approach.
The crucial part of the whole computation is the determination of the optimally scaled category scores y j for each variable.The y j are vectors of length k j which are normalized, either to y j D j y j = 1 or to y j D j y j = N where D j is the diagonal matrix with the univariate margins on the diagonal (i.e.D j = C jj ).Moreover we require y j ∈ K j , where K j is a suitable cone in R k j .Consequently are the elements of the covariance matrix S(Y) of the transformed variables.As quoted in the last section we work with cones and to exclude trivial solutions we have to impose normalization.This implies that, rather then using S(Y), for correlational aspects we will use the correlation matrix R(Y) with its elements r jl .
An additional task in optimal scaling are restrictions on the scale levels of the variables.Basically, we regard each variable as categorical.A nominal scale level involves no restrictions on the resulting scores.An ordinal scale level requires monotonicity and numerical variables additionally require equidistance between the scores.For Gifi methods the formulas are given in de Leeuw and Mair (2007a) and, consequently, implemented in the homals package.In our aspect framework the task it quite simple.Regardless whether we have correlational or noncorrelational aspects, all that is needed to impose ordinal scale levels is to perform isotone regression within each iteration of the optimization.We use the Pool-Adjacent-Violators Algorithm (PAVA; see e.g.Barlow, Bartholomew, Bremner, and Brunk 1972) which is implemented in the isotone package (de Leeuw, Hornik, and Mair 2009).This guarantees monotonically increasing category scores.

Types of correlational aspects
First we examine aspects directly defined on the correlation matrix.Formally, we study problems of the form φ(R(Y)).If we have only m = 2 variables we only have one correlation coefficient and, therefore only one aspect we can study: The variation of the correlation coefficient under the choice of category quantification.References and discussions for the m = 2 can be found in de Leeuw (1983b).
Here we focus on the general case of a multivariate dataset with m variables.Basically, the aspect φ defines the criterion to be optimized and Y = (y 1 , y 2 , . . ., y m ) with y j as the vector of category scores to be computed.From a general point of view the optimization problem can be formulated as In other words: We scale the observed variables (categories) in a manner such that an aspect of choice on the correlation matrix, which is based on these scores, is maximized.
At this point we specify various aspects based on R(Y) which are referred to as correlational aspects.Note that the following listing is just a collection of possible correlational aspects.
In fact, the aspect package allows the user to specify his own aspects.For the following convex correlational aspects, which are pre-specified in the package, we also give the first order derivatives needed for optimization.The aspect specification in the package (by means of the aspect argument; see Section 5) is given in parenthesis.
Sum of correlations ("aspectSum"): We can define a very simple aspect by taking the sum of the correlations.Optionally the correlations can be transformed by an even power q > 1. Formally, the aspect and its derivative can be expressed as where r jl are the elements of the correlation matrix R. Trivially, for q = 1 the elements of the matrix of partial derivatives ∂φ/∂R are all 1.This special case corresponds to the SUMCOR model proposed by Horst (1961bHorst ( , 1965)); the q = 2 case to the SSQCOR method in Kettenring (1971).
Sum of absolute correlations ("aspectAbs"): This aspect is based on the sum of absolute correlations (again, with the option of including power q, which now does not have to be even) and can be expressed as Of course we need to take suitable measures if one of the correlations is zero, because in that case the derivative will not exist.
Eigenvalue aspects ("aspectEigen"): The basic definition of an eigenvalue aspect is to maximize the largest eigenvalue λ of R. The finding of such quantifications is the same as finding the dominant eigenvalue in MCA.In literature it was pointed out repeatedly (e.g. de Leeuw 2006) that the first MCA dimension should be distinguished clearly from the others because the remaining solutions provide additional (suboptimal) solutions of the stationary equations.In other words, maximizing the first eigenvalue implies that we want to scale the variables in a way that they are as uni-dimensional as possible.In the psychometric literature this approach is sometimes referred to as MAXVAR (Horst 1961a(Horst , 1965)).As a multidimensional generalization, we can look at the sum of the first p eigenvalues where Ẑ is the m × p matrix of the first p eigenvectors.This generalization is used in nonlinear PCA (cf.de Leeuw 2006; de Leeuw and Mair 2007a).The aspect is convex because it is the pointwise maximum of a family of linear functions.
Determinant aspects ("aspectDeterminant"): This aspect, which is related to the multinormal negative log-likelihood, corresponds to the GENVAR criterion by Steel (1951).It can be expressed as The aspect log(det(R)) is concave, because it is the minimum over Γ of the linear functions log(det(Γ)) + tr(Γ −1 R), and thus its negative is again convex.This aspect can also be defined using covariances instead of correlations (de Leeuw 1988b).The covariance-based version is related to the Box-Cox approach in regression (Box and Cox 1964) and can be used for structural relation models.The latter relationship is studied extensively in Meijerink (1996).
Squared multiple correlations ("aspectSMC"): One variable j is defined as target (outcome, response) variable.The squared multiple correlation (SMC) of the m−1 remaining variables and its derivative can be written as where b . .b m is a vector of weights and restricted to have b j = 1.This apect is the multiple regression aspect (cf.de Leeuw 1986).Again it is convex, because it is a pointwise maximum of linear functions.
Sum of squared multiple correlations ("aspectSumSMC"): The previous aspect can be extended in a way such that we compute SMC's between various combinations of indicators and target variables and take the sum of the resulting SMC's.Let us define J as an arbitrary index set for j = 1, . . .m. Thus (7) generalizes to This aspect can be used for path analysis where J is determined by targets in the path model.Note that it is not necessary for b (j) to contain all remaining predictors but rather those determining a path to target j.If J is the full index set and each predictor is regarded in each SMC (in other words we take the sum over all m SMCs), this results in image analysis discussed in Guttman (1953).The latter version is implemented in our package.
These are the correlational aspects pre-specified in the aspect package.More aspects can be found in de Leeuw (1988b) and de Leeuw et al. (1999).As mentioned above, the package allows for the specification of user-defined aspects by means of functions which must return the function value of φ(R(Y)) and the derivatives ∂φ/∂R.

Optimization using majorization
This optimization problem is tackled by a majorization algorithm (de Leeuw 1994).Within our PsychoR project the majorization principle is introduced within the context of multidimensional scaling (MDS) in the SMACOF paper (de Leeuw and Mair 2008).Here we will just quote briefly the most important results from de Leeuw (1988b) and de Leeuw et al.
(1999) which are relevant for our aspect framework.
Majorization is a very general optimization approach.It is based on the principle that we have to find a minimum of a complicated function f (x).The idea is to find a simple surrogate function g(x, y), which for fixed y, is larger than f (x): We say that g majorizes f .This surrogate function is then subject of optimization.In our correlational aspect setting φ(R(Y)) has to be maximized.We have to make sure that φ is convex.As we indicated above, most of the interesting aspects fulfill these assumptions (de Leeuw 1988b).
Within each majorization step k we have to update the category scores y j for a particular variable j at a time, recompute the aspect including it derivatives ∂φ/∂r jl and then update y j for the subsequent variable j.At the optimum, the following stationary equation holds (de Leeuw 1988a): If the convex aspect is not differentiable we can substitute any subgradient for the partial derivatives.The corresponding Langrange multipliers λ j are Based on theorems in de Leeuw (1988b) and de Leeuw et al. (1999) the update for the category quantifications for variable j is given by where PROJ j is projection on the cone K j .Within each iteration k we update the category scores using (11a).The resulting y j (k+1) have to be projected and normalized by means of (11b) such that y (k+1) j D j y (k+1) j = n.We perform these steps for each variable j and convergence is reached when Convergence is proved (for convex aspects) in de Leeuw (1988b).Eventually, we get optimal scored categories (optimal with respect to the corresponding aspect) which we can organize as n × m score data matrix.This score matrix, or the corresponding correlation matrix based on Pearson correlations, can be used for subsequent analyzes.

Non-correlational aspects: Linearizing regressions 4.1. Bilinearizability
Before we describe the LINEALS approach we elaborate general statements about the property of bilinearizability.Bilinearizability means that we can find transformations of the variables such that all bivariate regressions are exactly linear.In the following paragraphs we present some conditions, implications, and consequences of bilinearizability.In these discussions we must clearly distinguish between bilinearizability as a population characteristic and as a sample characteristic.
Having m = 2 variables only, linearization is achieved if (see Hirschfeld 1935) This equation system can be solved by singular value decomposition (SVD) of . In fact, this is what simple CA does (see e.g. de Leeuw and Mair 2007b).In other words, if we perform optimal scaling for the m = 2 case (e.g. using any aspect), we always achieve bilinearizability.This is true for any discrete bivariate distribution, either population or sample.
In the m > 2 case things become more complicated.For this case, requiring bivariate linearity amounts to (cf. de Leeuw 1982) C jl y l = r jl D j y j , where j and l represent variable indices.The crucial question at this point is: Under which conditions can we achieve bivariate linearization for the m > 2 case?To examine this question we can summarize the following statements about bilinearizability, based on elaborations in de Leeuw's preceeding papers : The m = 2 case: As mentioned above, simple CA linearizes the regressions by means of SVD.
Binary variables: This case is trivial since having two categories only we always achieve linearization (Bekker and de Leeuw 1988, p.27).
Bilinearizable distributions: Assuming bilinearizability of the population distribution is weaker than assuming multivariate normality or multivariate ellipticity (de Leeuw 1988a, p. 446).Bilinearizability also applies to mixtures of standardized multinormals or to Yule's strained multinormal1 family.For Yule's family, aspect-based techniques "unstrain" the distribution by finding the inverse transformations which makes the distribution of the variables multinormal (de Leeuw 1993; de Leeuw et al. 1999).
Complete bilinearizability: In this even more restrictive case, bilinearizability holds for each dimension of an MCA solution.The conditions are that there exist matrices with Γ jl diagonal.The continuous multinormal statisfies these equations with Y j the Hermite polynomials.
In a data matrix with bilinearizability it does not matter which aspect we choose because they all give the same transformations (de Leeuw 2006, p. 120).Or, more precisely, if a system of transformations satisfies the stationary equations for one correlational aspect, it satisfies the stationary equations for other aspects as well.
If bilinearizability holds in the population, the sample correlation coefficients computed by maximizing an aspect have the same standard errors as the sample correlation coefficients computed from known scores (de Leeuw 2006, p. 120).This property will be examined more detailed in Section 4.3.
The most important implication is based on expression ( 9): If we substitute the bilinearizability condition ( 14) into ( 9) we find that the linearizing y's satisfy the stationary equation with the corresponding expression (10) for the Lagrange multiplier.This proves the following: If linearizing transformations exist, they will be found by optimal scaling techniques (de Leeuw 1988a, p. 447).Of course if they exist in the population distribution, then they will exist approximately in the sample.But if bilinearizability can be achieved, the corresponding scores can be found by any aspect optimization.The degree of linearization can be examined by means of regression plots (see Section 5), or by bootstrap approaches similar to van der Burg and de Leeuw (1988).Regression plots can also be used as diagnostics to study deviations, for example, from multivariate normality.
A concluding remark concerns the importance of bivariate linear regressions within the aspect framework.As mentioned above, it guarantees that different aspects lead to the same quantifications.As a theoretical consequence, this implies that classical asymptotic tests and estimates can still be used on the transformed data.We will further investigate this issue in Section 4.3 when using LINEALS for SEM preprocessing.

LINEALS formulation and its optimization
In the listing above we state that if bilinearizability is achieved it does not matter which aspect we choose.However, we do not know a priori whether we find linearizing scores or not.Therefore, a somewhat natural aspect is desirable which optimizes a bilinearizability criterion.Such a criterion can be established by means of the following two components (de Leeuw 1988a): On the one hand we have the elements r jl of the Pearson correlation matrix R which measure the linear association between the variables.On the other hand we can use Pearson's correlation ratio for judging non-linear relationships.Note that the correlation ratio is not symmetric.In the case of exact bilinearizability, η 2 jl − r 2 jl = 0. Now we can define the (non-correlational) aspect which transforms the variables by minimizing the sum of differences between the correlation ratios and the squared correlation coefficients.Equation 16is also known as LINEALS (de Leeuw 1988a;van Rijckevorsel 1987).Stating the loss function in this way tackles the linearizing regressions problem in a direct manner unlike correlational aspects or various homals versions.
Eventually, the value of the LINEALS loss function can be used as total discrepancy index.
The smaller the value the closer we achieve bilinearizability.Again, in the case of perfect bilinearizability we get the same optimal scaling with any method of choice and the discrepancy index becomes 0. Perfect bilinearizability is not realistic for real-life applications.In fact, de Leeuw et al. (1999) mention that we may be able to find approximate bilinearizability in ordinal data but it is very unlikely in purely nominal variables.However, minimization of ( 16) leads to a lower discrepancy compared to other optimal scaling techniques.Therefore, if the aim is to achieve bilinearization, LINEALS should be considered.The difference between the squared correlation coefficients and squared correlation ratio's can also be used to diagnose deviations from linearity.
Minimizing ( 16) is relatively simple and no majorization is required.Basically we optimize (16) over one y j at the time, keeping all others fixed at current values.Each of these subproblems is a generalized eigenvalue problem (de Leeuw 1988a).Note that in the current implementation there are no cone restrictions on the transformations.
To update the category scores y j for a fixed variable j we compute the k j × k j matrix and its normalized version The score update is given by Analogous to the majorization for correlational aspects in Section 3.2, these computations are performed for each variable j separately until |φ(Y (k+1) ) − φ(Y (k) )| < .

LINEALS as SEM preprocessing
In the case of categorical (ordinal) data, SEM software such as EQS (Bentler 1995) and LISREL (Jöreskog and Sörbom 2005) computes the polychoric correlation matrix based on the indicators.EQS uses the Lee, Poon, and Bentler (1992) approach whereas LISREL provides the PRELIS module (Jöreskog 2005).Note that polychoric correlations require an underlying normality assumption.Meijerink (1996) combines optimal scaling with SEM.He proposes an ML estimation for monotonously transformed numerical indicators.The assumption is that by means of the nonlinear transformation multinormality is achieved.This 1-step estimation approach unifies Gifi-based optimal scaling and path modeling.
The starting point of our 2-step LINEALS-SEM approach is the computation of the induced correlations based on the resulting scores y j : Now let F denote a corresponding probability distribution.Applying the delta method for determining the standard errors, de Leeuw (1988b) shows that if the population distribution satisfies bilinearizability, the derivatives of y j and y l with respect to F vanish from the expression.This generalizes results from Steiger and Browne (1984) who show that if the scores y j and y l are chosen in a way such that they maximize the correlation coefficient, the distribution of the optimal correlation coefficient (i.e., our induced correlation) is the same as the distribution of the ordinary correlation coefficient between G j y j and G l y l with the scores considered as fixed numbers (de Leeuw 1988b).Hence, the asymptotic distribution of the correlation coefficients can be computed as if the optimal transformations are known a-priori (i.e.non-random).
In order to avoid any distribution assumption we can use the formulas given in Isserlis (1916) to compute the fourth moment weight matrix which includes standard errors and covariances (see also de Leeuw 1988a, p. 450): w jl,uv = r jl,uv − 1 2 r jl (r jj,uv + r ll,uv ) − 1 2 r uv (r uu,jl + r vv,jl ) + 1 4 r jl r uv (r jj,uu + r jj,vv + r lluu + r llvv ) (21) with In (22a), the s •• represent the sample variances whereas in (22b), the y i• represent the i-th element of the resulting score matrix and y • the corresponding mean.
Now assume that we estimate the induced correlation matrix with LINEALS which, in turn, acts as SEM input.If the SEM software offers an asymptotical distribution free (ADF) approach to estimate the parameters (Browne 1984), we get consistent and efficient estimates of the structural parameters if the population distribution of the observed variables is bilinearizable.As mentioned above, this theory covers elliptic distributions, mixtures of standard multinormals, Yule's strained multinormals, etc. Relaxations of the multinormal assumption affecting correlation-based methods is studied in de Leeuw (1983a).Elaborations concerning the connection with the likelihood theory for multinormal distributed data can be found in de Leeuw (1988b).

Description of main functionalities
According to the explanations in Sections 3 and 4 the aspect package consists of two main functions: corAspect() and lineals().The corAspect() function for computing correlational aspects provides the argument aspect which the user to specify the type of correlational aspect (according to the listing in Section 3.1 where we also gave the string specifications for the argument).If additional parameters are needed (e.g., exponent, number of eigenvalues, target variable, etc.), they can be passed through "..." (see corresponding help file).Alternatively, the user can specify his own correlational aspect and pass the whole function to the aspect argument.Again, an example can be found in the corAspect() help file.In addition, by means of the level argument the scale level for each variable can be defined separately (i.e., mixed scale levels are allowed).
Both functions return the final value of the target function, the number of iterations, the resulting (optimally scored) category quantifications, the correlation matrix, and the Burt matrix.Note that corAspect() returns additionally the eigenvalues of the correlation matrix and lineals() returns the matrix of correlation ratios.Both functions return an object of class "aspect" where print, summary, and plot methods (transformation plot, regression plot) are provided.

Correlational aspects: Internet terminal data
In this section we give examples for correlational aspects using part of the data collected in Wurzer (2006).The dataset is about the use of public Internet terminals in Vienna.The eight items we use are the following: Do you know at least one place where you can find such a terminal?(yes/no) Have you already used such a terminal?(yes/no) How often do you use the Internet on each of the following locations: home, work, cafe, terminal, cellphone?(5-point scales; see below) Which of the following descriptions fits you best?(I'm here on vacation/I am from here/I'm here on business travel) The 5-point items we have the following categories: daily (1), almost daily (2), several times a week (3), several times a month (3), once a month (4), less frequently (5).As we see, the first two items are nominal (dichotomous), the next five are ordinal, and the last one is nominal (polytomous) again.
The first example uses the largest eigenvalue aspect which implies that we are basically doing a multiple CA scaling.The level restrictions on the variables are imposed according to their scale levels.
Next, we compute the SMC aspect by setting "terminal-used" as target variable.

LINEALS-SEM example
The data we use to show how LINEALS can be used within an SEM context (as described in Section 4.3) were collected by Duncan, Duncan, and Hops (1998).The duration of this longitudinal study was 5 years and the number of subjects (adolescents) n = 1204.At 4 points in time the participants were asked to rate cigarette and marijuana consumption on a 5-point scale: (1) Never consumed.
(2) Previous but no use over the last 6 months.
(3) Current use of less than four times a month.
(4) Current use of between 4 and 29 times a month.
(5) Current use of 30 or more times a month.Note that this dataset, which is included in the package, provides also the amount alcohol comsumption.For illustrational issues we only use marijuana (POT T1, ..., POT T4) and cigarettes (CIG T1, ..., CIG T4).> require("sem") > require("polycor") > data(duncan) > duncan.pc<-duncan[, 1:8] > head(duncan.pc) On these data we will fit a 2-factor model using the classic polychoric solution on the one hand, and our LINEALS approach on the other hand.We start with the computation of the correlation matrices of the indicators.The polychoric correlation we estimate by means of the polycor package (Fox 2007) which uses ML.For both cases we first print out the resulting correlation matrices.
Before fitting the 2-factor model, we examine bilinearizability of the LINEALS solution which has a loss value of .248.Having 8 variables, we have 28 bivariate combinations.For illustration purposes we pick out 3 of them including the regression plot for the observed categories > plot(res.lin,plot.type= "regplot", plot.var= c(1, 2)) > plot(res.lin,plot.type= "regplot", plot.var= c(1, 5)) > plot(res.lin,plot.type= "regplot", plot.var= c(5, 6)) Figure 3 represents the bivariate regressions for marijuana consumption T1 vs. T2, for cigarette consumption T1 vs. T2, and for marijuana consumption T1 vs. cigarette consumption T1.On the left hand side we have the unscaled solution based on the original categories.Let us first look at the red lines.For the scores on the x-axis we compute the conditional expectations in y-direction (i.e., the expected scores conditional on the x-value).If we condition on the y-axis and compute the expected scores in x-direction, we get the blue lines.The plot to the left represent the unscaled solutions with the underlying frequency table.The plots on the right hand side show clearly how optimal scaling stretches and squeezes the initial category scores.In the case of perfect bilinearizability we would get straight lines.Now we start to specify our 2-factor model using the sem package (Fox 2006).The marijuana responses form the first factor; the cigarette responses the second factor.The loadings are denoted by λ, the measurement-error variances by θ, the variances of the latent variables by φ, and the correlation between the latent variables by ρ.Now for both correlation matrices R p and R l we fit the same 2-factor model.
> sem.pol <-sem(dpc.ram,Rp, N = 1204) > sem.lin <-sem(dpc.ram,Rl, N = 1204) Note that the sem package does not provide ADF estimation but rather uses ML assuming multinormality.This is somewhat unfortunate for the LINEALS approach since we did not want to impose any distribution assumption on our transformed data.Therefore, a better way would be to use SEM software such as EQS which does ADF.Since we are somewhat reluctant to leave the R environment, we have to deal with this shortcoming.However, for the purpose of illustration also ML does the job and the resulting parameter estimates and their standard errors are given in Table 1.
By looking at common fit indices such as the comparative fit index (CFI ;Bentler 1990) and the root mean squared error of approximation (RMSEA) we get a CFI p = .875and RMSEA p = .290for the polychoric R p .With LINEALS preprocessing and the resulting R l

Scaled Solution
POT_T1 scores POT_T2 scores q q q q q q q q q q 701 39 43

CIG_T1 vs. CIG_T2 (scaled)
CIG_T1 scores CIG_T2 scores q q q q q q q q q q 505 38 25  input matrix we get CFI l = .932and RMSEA l = .189.Taking into account common rules of thumb (e.g.CFI > .90)we could conclude that with LINEALS preprocessing we get a satisfactory fit whereas with polychoric correlations we do not.However, for this example we see that by LINEALS preprocessing we get a better fit.

Discussion
In this paper we presented the aspect methodology and a corresponding package in R. Aspect represents a comprehensive framework for (correlational) multivariate analysis.The aspects we presented are only a convenient set of functions to be optimized.The package is programmed in a way that the user can provide his own myAspect() functions as an argument.
Considering each variable as categorical is not very efficient when having many categories, as typically in the numerical case.Therefore, in a future update we will use splines to make it more efficient.The corresponding theory is given in several chapters in the edited volume by van Rijckevorsel and de Leeuw (1988).
Using LINEALS for SEM preprocessing of categorical data including level restrictions seems to be a promising approach.Though, simulation studies are needed for examining bilinearizability in practice.As mentioned above, we can expect approximate linearization in the situation of ordinal data.The situation of mixed scale levels, the effect of different number of categories, as well as examining the LINEALS performance for underlying multivariate normal and non-normal distribution assumptions are substantial practical questions to be studied.
An even more sophisticated approach for optimal scaling and SEM would be to provide a 1-step procedure.Meijerink (1996) presents such an approach for numerical variables based on Box-Cox type of methods (with spline transformations).We could define an aspect of the form φ(R(Y)) = min θ log det Γ(θ) + tr Γ −1 (θ)R(Y) with θ as structural parameters.Note that this is again concave in R. The derivative of this aspect is simply Γ −1 .This aspect function can be calculated by fitting any correlation structure model with the sem package or any other SEM software.Thus, just apply the SEM computations iteratively to optimally rescaled correlation matrices.The theory for the standard errors and Chi-square statistics as elaborated in Section 4.3 holds.

Figure 3 :
Figure 3: Regression plots for marijuana and cigarette bilinearizability