Inter-Battery Factor Analysis via PLS : The Missing Data Case

In this article we develop the Inter-battery Factor Analysis (IBA) by using PLS (Partial Least Squares) methods. As the PLS methods are algorithms that iterate until convergence, an adequate intervention in some of their stages provides a solution to problems such as missing data. Specifically, we take the iterative stage of the PLS regression and implement the “available data” principle from the NIPALS (Non-linear estimation by Iterative Partial Least Squares) algorithm to allow the algorithmic development of the IBA with missing data. We provide the basic elements to correctly analyse and interpret the results. This new algorithm for IBA, developed under the R programming environment, fundamentally executes iterative convergent sequences of orthogonal projections of vectors coupled with the available data, and works adequately in bases with or without missing data. To present the basic concepts of the IBA and to cross-reference the results derived from the algorithmic application, we use the complete Linnerud database for the classical analysis; then we contaminate this database with a random sample that represents approximately 7% of the non-available (NA) data for the analysis with missing data. We ascertain that the results obtained from the algorithm running with complete data are exactly the same as those obtained from the classic method for IBA, and that the results with missing data are similar. However, this might not always be the case, as it depends on how much the ‘original’ factorial covariance structure is affected by the absence of information. As such, the interpretation is only valid in relation to the available data.


Introduction
Among the PLS methods created by Wold (1985), the most important are NI-PALS, PLS-Regression (PLS-R) and PLS-Path Modeling (PLS-PM), which were designed for the treatment of one, two and k quantitative data matrices, respectively.PLS-R studies the relationship between two groups of variables X and Y even in the presence of multicollinearity, and has been applied with great success in fields such as Chemometrics, Sensometrics, Genetics, Medical Imaging (Pérez & González 2013), among others.
These PLS methods are convergent algorithms, and, as such, they allow intervention in some of their stages or phases in order to optimally handle missing data problems, mixed data, etc.For this reason, the development of PLS algorithms that replace classical methods like IBA is important (that being the main focus of this article), as it happened with NIPALS (Wold 1966) for the Principal Component Analysis (PCA) or GNM-NIPALS (Aluja & González 2014) for the treatment of a mixed data matrix.
In recent literature (Tenenhaus & Tenenhaus 2011), IBA is considered as a special case of the Regularized Generalized Canonical Correlation Analysis (RGCCA) for the optimization problem of two continuous data blocks that take advantage of the flexibility of PLS-PM.See the rgcca() function from the RGCCA package (Tenenhaus & Guillemot 2013).
When studying interrelations between two groups of variables X n,p and Y n,q via IBA (Tucker 1958, Tenenhaus 1998), it is frequent to find missing data.I such a case, it is not possible to apply the classical method without suppressing or estimating the individuals whose data is missing as IBA requires the spectral decomposition of the product between the inter-group covariance matrices X Y Y X, (see, for example the interbat() function from the plsdepot package (Sanchez, G. 2012)).
However, the PLS-R algorithmic regression methods (Tenenhaus 1998) with one (PLS1) or multiple (PLS2) Y variables provide a solution alternative as they are based on the regression concepts.In effect, this can be seen as an orthogonal projection between vectors of available data according to the basis of the NIPALS algorithm for missing data, without resorting to data imputation.
PLS2 investigates the t h and u h components in each group X and Y and for each stage h = 1, 2, . . ., s1 , maximizing cov(t h , u h ).These X h−1 a h and Y h−1 b h components are a linear combination of the variables from the respective groups.The ideas behind the PLS2 algorithm are retaken during the convergence phase, as in the limit, and through successive replacements.Then, the stationarity equations associated with the first stage of IBA are verified in order to obtain the first λ 1 eigenvalue associated with the product between the covariance matrices for each h stage.
After obtaining convergence for orthonormal a h , the X h−1 − t h p h matrices of the first group, and the Y h−1 t h b h matrices of the second group are deflated, both with respect to t h , in order to proceed to the next iteration on stage h + 1 (see section 2.3.1).
However, these deflated matrices must be modified, taking the form X h−1 −t h a h in the first group and Y h−1 − u h b h in the second group.In this way, IBA and its properties are obtained via PLS with the previous orthonormalization of b h (see section 3).
In this article, a PLS algorithm for the IBA method is developed under the R environment, breaking the rigidity of the classical method, and contributing to a solution to the missing data problem.This problem is solved by adequately intervening in certain phases of the algorithm and implementing the available data principle, according to the NIPALS method.
In section two, the methodologies inherent to the process are presented.Firstly a recapitulation of IBA is created, and then the NIPALS and PLS2 methods are described, including the pseudo-algorithms, which are useful in the algorithmic solution proposed for IBA.
Chapter 3 ties together the basic concepts of the aforementioned procedures, and proposes the basic structure of the algorithm, which executes classic IBA with complete data and an IBA with missing data (see IBA R code in Appendix).
Section four describes the application: first, the linnerud database which is, taken from the calibrate package, Graffelman (2013).This database is used for the application of the IBA algorithm, both with the complete data, and the missing data, which is the result of randomly contaminating 7% of the data set (declaring them NA (not available) for the analysis).Subsequently, the results are presented, highlighting the equivalences with the classic IBA (complete data), and puttying an emphasis on the analysis performed an missing data, without forgetting that these results, regardless of their likeness, must be upheld solely from an available data starting point.
Finally, section five is dedicated to the main conclusions and recommendations derived from the study.We particularly highlight future investigations oriented towards IBA with missing data and mixed data that optimally quantify the qualitative variables from a k-dimensional function starting point, according to the GNM-NIPALS method (Aluja & González 2014).

Inter-Battery Factor Analysis
The Inter-battery Analysis (developed by Tucker 1958) starting points are two data sets X, and Y, containing n individuals and p and q variables (columns) respectively, in which the t h = Xa h and u h = Y b h components are investigated.Their own group is then explained and it is always as correlated as possible.It is imposed on a h ∈ R p and b h ∈ R q to be orthonormal.
The objective is then to maximize the covariance or simultaneously to maximize the product between their variances and correlation, which is: This method is, in itself, a compromise between the Canonical Analysis (CA) of X and Y that max[r(Xa h , Y b h )] and the Principal Component Analysis (PCA) of The variables are supposed to be centered and reduced; hence, the covariance, or intra-X correlation matrix is R 11 = 1 n X X, and the intergroup matrix corresponds to R 12 = 1 n X Y ; R 21 = R 12 .Observe that if A contains every a h then: Revista Colombiana de Estadística 39 (2016) 247-266

Optimal Solution
From the covariance: We can deduce that we have reached reach an optimal value when the cosine equals 1, i.e., when the vector a h is collinear with R 12 b h , and so γ h = ||R 12 b h ||.
In the same way, we reach an optimal value when the vector b h is collinear with a h R 12 , and with this taken into account, γ h = ||R 21 a h ||.
Applying the langrangian to cov(Xa h , Y b h ) under a h a h = 1 and b h b h = 1 we obtain the following system: which derivatives δL δa = 0 and δL δb = 0 and leads to: The previous system is relatively different to that found through CA.Premultiplying the two equations by a and b respectively we obtain 2λ = 2µ = γ, and, therefore, verifying the previously noted collinearities.The stationarity equations are: That is to say, a h is a p order eigenvector of the symmetric matrix R 12 R 21 , associated with the largest γ 2 h eigenvalue, guaranteeing maximum covariance.In this way, the a h form an orthonormal base in R p .Analogously, b h is an eigenvector of R 21 R 12 associated to the same biggest γ 2 h eigenvalue and form an orthonormal base in R q .

Properties of the t h and u h Components
• The t h components of the same group are not orthogonal, because of from ( 4) and with this t h t l = 0 (and analogously u h u l = 0) must be taken into account when calculating the explained variances or redundancies.
• The t h and u l components of different groups and orders are orthogonal given that: • The interpretation of the components starting from ( 3) is: with this, a h is collinear to the correlations vector between X j and u h .In a similar fashion b h is collinear to the correlations vector between Y k and t h .
There is coherence between the variable coefficients and the correlations between variables of one group and the components of the other group.

Decomposing the Correlation Matrix R 12
The PCA, like the reconstitution of R 12 in (4), is given by R 12 = s h γ h a h b h and, through ( 5) and ( 6), leads to: The inter-group correlation matrix can be visualized using the correlations between the group variables and the other group's components.
In addition, the best approximation is obtained in the direction of the least squares of R 12 through its simile with dimension p, q and range m: We can measure the quality of the approximation, defining the number of components to be retained.In addition, these norms are calculated in terms of the eigenvalues: Revista Colombiana de Estadística 39 (2016) 247-266

Relation Between the Two Variable Sets: Factorial Structure
In this section we describe the principal elements to be retained for the results analysis.Firstly, the γ2 h values and the eigenvectors a h , b h are associated with the matrix R 12 R 21 , and therefore, with the t h and u h components.This verifies that the sum of variances across all of h is p and q, respectively.
As we have the correlations between the components of both groups r(t h , u h ) and the factorial structure, we can reconstitute R 12 starting from m components according to (7).The factorial structure refers to the correlations r(x j , t h ), r(x j , u h ), r(y k , u h ), and r(y k , t h ).
We are going to obtain the explained variance parts and the commonalities, and due to the correlation between the components, this calculation must be made with the help of regression.

• Intra-group Communality
We can measure the variance part of each variable explained in its m canonical components to be retained, and these indexes are called commonalities, as in factorial analysis.The intra-X communality with m components is defined as: R 2 (x j , t 1 , . . ., t m ) We calculate the variance of X j explained in t 1 ; (t 1 , t 2 ); . . .; (t 1 , t 2 , . . ., t m ).
As in PCA, we have the reconstitution 2 X = m h t h a h , and so the variable As such, when performing the regression with m components we obtain: When m = 1, this corresponds to a simple regression, in which β1 = X j t 1 ; with m = s, the estimation is exactly the same as that of the PCA because the coefficients a 1j = β1 , ., a sj = βs match.In any regression, the v(Xj ) is obtained, and it measures the variability percentage of X j which is explained by the Xj regression.However, as v(X j ) = 1 then represents the intra-group communality of X j in the m components.As such, we need to execute as many progressive regressions as the number of components we have, that is to say with t 1 ; (t 1 , t 2 ); . . .
As before, these coefficients are obtained from as many progressive regressions as u m components we have.
The variable with weak intra-group communality, does not participate much in the study as they are not particularly related with the active variables of the other group.
• Inter-Group Communality It's defined as the cross variance, that is to say, the variance of each variable explained in the m components of the other group: The variables with little inter-group communality are specific from their own group, they are not very related to the other group; these variables can be suppressed without perturbing the analysis.

NIPALS Algorithm
This algorithm is the base of the PLS regression (Wold 1966).It fundamentally executes the singular decomposition of a data matrix through the use of convergent iterative sequences for orthogonal projections (geometric concept of simple regression).With complete databases, the results are equivalent to those found using PCA; however, and this is probably its greatest virtue, it can execute PCA even with missing data and obtain its estimations starting from the reconstituted data matrix.
If X n,p is the data matrix of range a ≤ p, columns X 1 , . . ., X p are supposed to be centered or standardized (under S n ).The reconstitution derived from the PCA leads to X = a h t h p h where t is the principal component (scores)and p h the eigenvector (loadings) on the h axis.
In this way, column X j = a h p hj t h with j = 1, . . ., p and the ith row x i = a h t hi p h with i = 1, . . ., n.
Revista Colombiana de Estadística 39 (2016) 247-266 It can be observed then that if h = 1, column j is expressed as X j = p 1j t 1 , that is p hj = X j t h acts like the coefficient (slope) 3 in the regression (without intercept) of X j over t h .In the space of rows, t hi is the constant-less regression coefficient of the individual x i over p h .
If h > 1, p hj is the regression coefficient of t h in the simple regression of the deflated vector X j − h−1 l p lj t l over t h , and t hi is the coefficient of p h in the regression of x i − h−1 l t li p l over p h .

NIPALS Pseudo-Code
For h = 1, the algorithm starts by taking any column of the matrix X as the first principal component t 1 , in order to immediately calculate a normalized p 1 and then recalculate t 1 in an iterative process until p 1 converges.The flow diagram associated with the convergence procedure is: After that, on each stage h = 2, . . ., a, the deflated matrix X h = X h−1 − t h p h will be built, and from it we will take t h orthogonal to t h−1 in order to start the convergence process of p h orthonormal to p h−1 , according to the previous flow diagram t 1 , p 1 and X 1 will be replaced with t h , p h an X h , respectively.NIPALS' main characteristic is that it works in terms of a series of scalar products of the coupled elements.This allows the management of missing data, adding the available pairs in each operation.Geometrically the procedure 'takes' the omitted elements as if they fell over the regression line: they are not leverage points.
The NIPALS pseudo-algorithm associated with missing data provides the basic elements to develop the IBA with missing data in the sense of only executing the scalar products with the coupled available data.This is described in stage 2.2.1 • NIPALS pseudo-code with missing data In stages 2.2.1 and 2.2.3 we calculate the slopes of the least square lines that pass through the origin of the cloud of points over the available data.The p hj and t hi must preserve j and i in their positions as well as the missing data characteristic given by x ij , which can be expressed as NA (Not available).This allows an excellent management through R functions such as na.omit() at the moment of developing the corresponding script.
If Y is the matrix of dependent variables y 1 , . . ., y r and X is the matrix of independent variables x 1 , . . ., x p with rank a over n individuals, and all the variables are centered and reduced, then there is the possibility of multicollinearity in the interior of each block.Even of r and p are greater than n, there is also the possibility that there is some missing data.
For now, we have two sets of variables Y and X, for which we assume that a latent relation between the two blocks exists.This can be explained by H ≤ a latent orthogonal components t h (h = 1, 2, . . ., H), which are obtained as a linear combination of the variables of the predictor set X.They are highly related with Y through their linear combination u h = Y c h As such, the predictor and answer matrices are decomposed as follows: where P H and C H are the weight matrices containing the parameters for the model, and X H and Y H the residual matrices representing the variability of the data unexplained by the parameter models.

PLS2 Pseudo-Algorithm
There are numerous versions of the PLS2 algorithm that differ in the level of normalization chosen.Here we describe the classical PLS2 regression algorithm, taking into account the missing data management in accordance with the principles extracted from the NIPALS (Lindgren, Geladi & Wold 1993).
When there is missing data, we apply the principles from the NIPALS algorithm: the coordinates of the vectors w h , t h , c h , u h and p h are calculated as the slope of the least squares' lines passing through the origin (only over the available data).

Optimization Criteria
We can pin down the convergence on stage 2. The cyclical relationships of this stage show that, on the limit, the vectors w h , t h , c h and u h , through successive replacements, verify the following equations: λ h is the greatest common eigenvalue between these matrices, which have been divided by n-1 to reclaim the eigenvalues.Therefore, Stage two corresponds to an application of the iterative power in order to calculate the eigenvector of a matrix, associated to the largest eigenvalue for each h.
We can obtain the t h and u h components starting which is the first stage of Tucker's IBA from the tables X h−1 and Y h−1 .On each stage h we investigate two normalized vectors w h and c * h , maximizing the criteria cov(X h−1 w h , Y h−1 c * h ), or globally maximizing the criteria: the vector c h is collinear with the vector c * h = c h /||c h || and s ≤ a.We will now build the deflated matrices X h and Y h in stages 4 and 5 as residues of the regressions of X h−1 and Y h−1 over the t h component.Observe that there deflations must be modified to obtain the orthonormality properties on both the a h and the b h , according to the IBA.

The IBA Algorithm Via PLS
This algorithm describes the relations between two data sets X and Y by maximizing the covariance between the latent components t h and u h of each set, respectively.Basically, we perform a spectral decomposition of X Y Y X and Y XX Y in order to obtain the respective h = 1, . . ., H components; (H = s).
The algorithm is built over the structure of the PLS2 procedure, changing the calculation of the vectors w h , t h , c h , and u h for a h , t h , b h and u h respectively, with or without missing data (see ej cycle in section 3.1).The convergence of these vectors on each stage h is quickly secured, usually in no more than 20 iterations; nonetheless the ej cycle executes 100 iterations in order to leave some convenient room.We can set the threshold ε = 0.0001 so that if ||a h,j − a h,j+1 || < ε we can guarantee convergence of a h in the jth iteration in order to continue with the next stage h.
Once the a h , t h , b h and u h vectors have converged, the initial matrices are deflated through the procedure X 0 − t h a h and Y 0 − u h b h in order to guarantee the orthonormality of the vectors a h and b h in the next stage h: this is the principal restriction of the IBA.
Observe in Appendix (IBA R Code) that in order to calculate these vectors we use the na.omit() function, which uses the coupled available data of the two vectors X j and u h .The scalar product between these two vectors allows us to obtain, for example, a h .
The algorithm inherits the properties described in 2.1.2for the Classic IBA.With missing data, the said properties are guaranteed through the orthonormalization() function of the far library (see Appendix).The same process is analogously applied in the calculation to obtain b h .
Finally, through list (aH,tH,bH,uH,lH,rH) the algorithmic function named fAIBna returns these vectors along with the eigenvalues lH and the correlations rH between the components.The pseudo-code for the IBA with or without missing data is presented in section 3.1, and the R code is presented in Appendix.

IBA: Pseudo-Code Algorithm
The algorithm is based on the principles proposed by the NIPALS algorithm with missing data, as well as an the structure proposed by the PLS2 algorithm.The order exception is the deflation stages, which have been adequately modified in order to maintain the IBA properties.

Application
The IBA via PLS algorithm (IBApls) is implemented and run using the linnerud and linnerudNA databases in order to study the relation between two matrices with or without missing data.

Linnerud Database
The linnerud database can be obtained from R's calibrate package.It contains the physical and exercise variables of 20 users of a gymnastics club.The database is conformed by two groups, i.e., the first matrix X contains the physical variables weight, height, and pulse (Poids, Tail, Pouls) that will be related to the exercise variables contained in the matrix Y : Traction, flection, jump (Tracti, Flexin and Sauts).Both matrices have a 20x3 dimension.
Table 1 represents the incomplete database (linnerudNA), approximately 7% of the data has been declared Not Available (NA).

Results
The application of this algorithm (see Appendix) through the fAIBna(Y,X) function to the complete linnerud database, formed by the X and Y subgroups, leads to the same results as those obtained by applying the classical IBA method (Tenenhaus 1998).The results are the following: The eigenvalues 1.27243, 0.00566 and 0.00111 correspond to the squared covariances γ 2 h on stages h = 1, 2, 3. Tables 2 and 3 show the eigenvectors and the components associated with the classical IBA (complete data).
For the missing data case (NA) we use the database linnerudNA that is listed in section 4.1; the same as before, the first three columns make up X and the last three Y .By Applying the fAIBna(Y,X) function over this matrices, we get the following results:  The eigenvalues 1.17246, 0.00962 and 0.00138 are relatively similar to those obtained with classical IBA, and, in Tables 4 and 5, which contain the eigenvectors and the components associated with the missing data IBA, we can also see a similarity with the classical IBA results.It can be seen Table 6 that the correlations between components of different groups and dimensions are practically 0, despite the absence of data.These results are relatively similar to those obtained with the complete linnerud database; however, these factorial similarities will not always appear as they depend on how much the 'Original' matrix is affected due to the absence of some data and how this absence influences the correlation structure.The results must be interpreted as a function of the available data.
Note that the correlations of the t h and t   We then proceed to calculate the correlations between each of the physical measures xj and the first two components u 1 , u 2 .Analogously, Ŷ = α u α b α , therefore we can calculate the correlations between the exercise variables ŷk with the components t 1 and t 2 .These correlations constitute the coordinates for axes 1 and 2.
Figure 2 portrays the inter-group correlation matrix R 12 .Axis 1 corresponds to the physical potential fundamentally expressed through the weight and height (poids, tail) of the subjects, attenuated by the pulse (pouls); axis 2, on the other hand, grades the global performance on the exercises, opposing the pushups and pullups (flex, tract) with the jumps (sauts).

Conclusions and Recommendations
• The IBA via the PLS (IBApls) method was developed, preserving all of its properties and optimization characteristics, and providing an algorithmic procedure under R that leaves aside the rigidity of the classical method.
• The IBApls was run with databases that had missing data, proving its functionality.The analysis was done under the available data principle as in NIPALS, without data imputation.
• The linnerud database was used to apply the IBApls with or without missing data.With the complete data set, the results are equivalent to those found using the classical IBA, and with approximately 7% of the data missing, the results are relatively similar.However, the analysis must be made as a function of the available data.
• Starting with the flexibility of the IBApls, its possible to solve the mixed data (quanti-qualitative variables) problem through the optimal GNM-NIPALS quantification criteria.
• With these solutions, it is possible to find an optimal, joint solution for IBA with mixed and missing data.

•h
components and the u h and u • h components are generally high, with or without missing data, given that: Revista Colombiana de Estadística 39 (2016) 247-266

Figure 2 :
Figure 2: Correlations chart; variables vs components of the other group.
p do a hj = Σ {i:xij and u hi exist} x h−1,ij u hi Σ {i:xij and u hi exist} u 2 ik and t hi exist} y h−1,ik t hi Σ {i:y ik and t hi exist} t 2

Table 2 :
a h and b h eigenvectors, classical IBA.

Table 3 :
t h and u h components, classical IBA.

Table 4 :
a • h and b • h eigenvectors, missing data IBA.