A Methodology for Biplots Based on Bootstrapping with R

A biplot is a graphical representation of two-mode multivariate data based on markers for rows and columns often provided in a two-dimensional space. These markers define parameters that help to interpret goodness of fit, quality of the representation and variability and relationships between variables. However, such parameters are estimated as point values by the biplot, thus no information on the accuracy of the corresponding estimators is obtained. We propose a graphical methodology, that may be considered as an inferential version of a biplot, based on bootstrap confidence intervals for the mentioned parameters. We implement our methodology in an R package and validate it with simulated and real-world data.


Introduction and Bibliographical Review
Analyses of high dimension data matrices for individuals and variables can be performed using multivariate techniques, which reduce this dimensionality projecting the data onto an optimal subspace, conserving the patterns of similarity between individuals and of covariation between variables.Differences among these techniques depend on the type of variables and metrics used into the respective subspaces.The biplot methods (biplots in short) proposed by Gabriel (1971) are part of such techniques, but biplots did not diffuse at the same speed as other multivariate techniques, due to the absence of software.Biplots are a graphical display in the context of principal component analysis (PCA in short, and PC for principal components), jointly representing a multivariate data matrix by markers for its rows (individuals) and columns (variables), permitting interrelations between them to be captured visually in a low-dimensional space.Biplots allow us to make description and also modeling and diagnostics (Bradu & Gabriel 1978) and are a powerful data visualization tool that can be considered as a multivariate version of the scatterplot, because biplots are usually performed in the two-dimensional space.This is the classical biplot of Gabriel, which has two parts.First, it approximates the data matrix by a singular value decomposition (SVD).Then, this matrix is factorized to obtain a low dimension Euclidean map through row and column markers represented by points/vectors, similarly to the case of the factorial correspondence analysis (CA).However, in biplots, the interpretation is based on the geometric properties of the scalar product between the rows and columns, allowing an approximation of the data matrix elements to be obtained.Gower & Harding (1988), Gower (1992) and Gower & Hand (1996) provided a different focus of the classical biplot, ordering the individuals by scaling and then superimposing the variables so that a joint graphical interpretation is possible, as usual in biplots.The two most used biplots are known as GH and JK.Galindo (1986) proved that, with a suitable choice of the markers, it is possible to represent the rows and columns simultaneously on the same Euclidean space with an optimal quality, which is called the HJ biplot.Its coordinates for columns are the column markers in the GH biplot and the coordinates for rows are the row markers in the JK biplot.HJ biplot of Galindo has been applied in several fields.Orfao, González, San-Miguel, Ríos, Caballero, Sanz, Calmuntia, Galindo & López-Borrasca (1988) applied this biplot to histopathology; Rivas-Gonzalo, Gutiérrez, Polanco, Hebrero, Vicente-Villardón, Galindo & Santos-Buelga (1993) to enology; Mendes, Fernández-Gómez, Galindo, Morgado, Maranhão, Azeiteiro & Bacelar-Nicolau (2009) to limnology; Viloria, Gil, Durango & García (2012) to biotechnology; Díaz-Faes, González-Albo, Galindo & Bordons (2013) to bibliome-try; García- Sánchez, Frías-Aceituno & Rodríguez-Domínguez (2013) to sociology; and Gallego-Álvarez, Galindo & Rodríguez-Rosa (2014) to sustainability.Another biplot is known as GGE which displays the genotype main effect (G) and the genotype by environment interaction (GE) in two-way (two-mode) data.The GGE biplot originates from data graphical analysis of multi-environment variety trials.Technical details of the GH, HJ and JK biplots are provided in Section 2 and of the GGE biplot in Frutos, Galindo & Leiva (2014).Ter-Braak (1986) used biplots fitted with linear models in the context of direct gradient analysis, which allows a set of species to be ordered according to its relationship with a set of environmental variables.Gauch (1988) employed the biplots for validating and selecting models when the interaction between genotype and environment is studied.Ter-Braak (1990) and Ter-Braak & Looman (1994) took advantage of the relationship between biplot and regression methods to introduce the biplot of the regression coefficient matrix and to propose a biplot based on reduced rank regression.Cárdenas & Galindo (2003) investigated the inferential aspects of biplots using the generalized bilinear models, extending their fitting with external information for variables related to the exponential family.Vairinhos (2003) showed that the biplots are an ideal basis for the development of a data mining system, because most of the data analysis techniques can be expressed as particular cases of biplots.Amaro, Vicente-Villardón & Galindo (2004) studied the properties of MANOVA biplots within the context of the multivariate general linear model, developing methods for their interpretation.Hernández (2005) studied the performance of biplots in the presence of outliers and Ramírez, Vásquez, Camardiel, Pérez & Galindo (2005) proposed biplots to detect multicollinearity.As an alternative to the multiple CA for the case of presence/absence variables associated with the binomial distribution, Vicente-Villardón, Galindo & Blázquez (2006) considered the prediction biplots and applied it to biplots fitted by generalized linear regression, proposing logistic biplots, later extended by Demey, Vicente-Villardón, Galindo & Zambrano (2008).Bradu & Gabriel (1974) and Bradu & Gabriel (1978) studied the fitting of bilinear models in two-way tables, analyzing collinearity between rows and columns on the biplots.Gabriel & Zamir (1979) also worked on the fitting of these bilinear models, but they proposed iterative techniques to obtain approximations at low rank using weighted least squares.Denis (1991), Falguerolles (1995), Choulakian (1996) and Gabriel (1998) used biplots to study interactions in two-and-threeway tables.Gabriel (1998) developed diagnostics in models based on contingency tables.Sepúlveda, Vicente-Villardón & Galindo (2008) used biplots as a diagnostic tool of local dependence in latent class models.
Methods for three-way data analysis have shown to be variants of the PCA of the two-way supermatrix, being the two most common ones: (i) TUCKER3 (Tucker 1966) and (ii) STATIS (L'Hermier des Plantes 1976).In (i), the data are summarized by three-mode components, and for their entities (individuals, sampling sites, etc.), component loadings are yielded.In (ii), data are compared on several occasions (time instants) by a PCA linked into column vectors (variables), belonging to different occasions.Based on TUCKER3, Carlier & Kroonenberg (1996) generalized the SVD to a three-mode table proposing interactive and joint biplots to capture the information from the data.The difference between these two biplots is how the initial data matrix is treated, because in the interactive biplot two modes are combined, whereas the joint biplot is conditioned to one of the modes.Martín-Rodríguez, Galindo & Vicente-Villardón (2002) proposed metabiplots following the meta-PC and procrustes methods, allowing biplots to be compared for studying individuals with variables, alternatively to the interactive and joint biplots.
Vallejo-Arboleda, Vicente-Villardón & Galindo (2006) and Vallejo-Arboleda, Vicente-Villardón, Galindo, Fernández, Fernández & Bécares (2008) proposed the canonical STATIS, a biplot for multi-table data.Frequently, multivariate data taken over multiple occasions are found to produce a multi-table experiment.Neither the separate analysis of each occasion, using MANOVA or canonical analysis, nor the joint analysis using STATIS for multiple tables, are adequate to capture the real structure of the data matrices.This is because the former one accounts for group structure, but for not time evolution, whereas the last one confuses between and within group variabilities.Canonical STATIS permits a data group structure to be accounted, as well as time evolution on various occasions, by obtaining common or stable canonical variables across multiple occasions or data sets.
The main objective of our work is to introduce a new methodology for biplots based on bootstrapping.We implement it in a graphical user interface (GUI) package developed in the statistical software R (www.r-project.org),named biplotbootGUI.R is an integrated suite of software facilities for data manipulation, calculation and graphical display; see R-Team (2013).
The paper is organized as follows.In Section 2, we provide the technical background of this work.In Section 3, we propose a biplot methodology with bootstrapping and the state-of-art of the software developed for biplots.In addition, in this section, we detail the features of the biplotbootGUI package.In Section 4, we perform the numerical application of the proposed computational implementation by using simulated and real-world data sets.Finally, in Section 5, we sketch some discussions, conclusions and future works.

Background and Technical Preliminaries
In this section, we provide some technical preliminaries useful for facilitating the understanding of the results proposed in this paper.

Biplot Representations
Any I × J two-way data matrix X can be expressed as the product of two matrices: A with I rows and S columns and B with S rows and J columns.If S is equal to two, then each row in A and each column in B have two values defining a point in a two-dimensional plot.When both of I rows of A and J columns of B are displayed in a single graphical representation, this is called a biplot.Thus, a biplot is a graph of a matrix X I×J with row and column markers a 1 , . . ., a I and b 1 , . . ., b J , respectively, chosen in such a way that the inner product a i b j is the element x ij of X.The rows and columns of this marker matrix are the coordinate points in an Euclidean space related to the same orthogonal axes.A property of a biplot is that each of the I × J values can be recovered by viewing its I + J points, which is a display of a matrix of rank equal to two (rank-two).Decomposition of a matrix X into its component A and B is called a SVD, obtaining as result S PCs.A two-way data matrix rarely has rank-two, so that approximating X by a rank-two matrix means that only the first two PCs are used for representing it.If these explain an important proportion of the total variability of X, then it is sufficiently approximated by a rank-two matrix and can be displayed in a biplot.
Let X be a data matrix composed by I individuals measured on J variables.The SVD of X is defined as X = U ΛV , where U is a matrix whose column vectors are orthonormal and correspond to the eigenvectors of XX , V is a matrix whose column vectors are also orthonormal and correspond to the eigenvectors of X X, and Λ is a diagonal matrix containing the singular values arranged in decreasing order.An element of X may be written generically as x ij = min(I,J) s=1 λ s u is v js .The first S elements of u s and of v s combined with the singular values λ s in different ways are used as the coordinates for a graphical display of the data.The most common types of biplots are shown in Figure 1.In a biplot, the column markers b j are shown as arrows and the row markers a i as points; see Figure 1.The biplot representation makes the projection of the row markers onto the column markers easier.The relationships between individuals and variables are studied through the projection of the points representing individuals onto the vectors representing variables, that is, x ij ≈ a i b j implies x ij ≈ ||proj a i /b j || sign b j ||b j ||, where ||proj a i /b j || is the length of the segment from the origin to the point a i (length of the projection from a i to b j ), sign b j is the sign of b j and ||b j || is the module of b j (length of the segment from the origin to b j ).This means that x ij is approximately the module of the projection of a i onto b j multiplied by the length of b j , with its corresponding sign.The direction of the vector b j shows the increasing direction of the corresponding variable values.The projections of the points a i onto a column vector approximate the jth column of X and provide an ordination of the individuals respect to the corresponding variable.Once a way of representation is defined, it can be interpreted.Thus: • The distance between points are dissimilarities between the corresponding individuals, specially if they are well represented.Individuals that are far away from each other have a larger Euclidean distance (ED) and vice versa.
In Figure 2, the largest ED is observed between individuals a 1 and a 8 and the smallest ED is obtained between a 5 and a 6 .
• In the JK biplot, the line length approximates the variance of the variable.Hence, the longest line is the largest variance.From Figure 2, the variable b 3 has the largest variance among the variables, while the variable b 2 has the smallest variance.The cosine of the angle between the vectors approximates the correlation between the variables they represent.Thus, as this angle goes to 90 (or 270) degrees, the corresponding correlation decreases.An angle of 0 or 180 degrees reflects a correlation of 1 or −1, respectively.The  • The relationships between individuals and variables can be interpreted in terms of scalar products, that is, through the projections of the points onto the arrows.It permits us to know what variables differentiate among groups of individuals.If the projection falls on the origin, the value of the observation is approximately the average of the respective variable.Thus, as the projection of an individual goes increasing onto the direction of a vector representing a variable, this individual goes moving away from the average of that variable, whereas the opposite occurs when the projection goes increasing onto reverse direction.Therefore, in Figure 2, individual a 2 stands out with the largest value of the variable b 4 , followed by a 1 , a 7 and a 9 .
• In the HJ biplot, the search for the variables that differentiate individuals is made by the interpretation of the factorial axes, that is, the new variables that are a linear combination of the original variables and the relationships of new variables with the observed variables.
• The measure of the relationship between the axes of biplots and each of the observed variables is called relative contribution (RC) of the factor to the element, which represents the variability proportion of each variable explained by the factor.This measure is interpreted such as the coefficient of determination in regression.In fact, if the data are centered, this is the coefficient of determination in the regression of each variable on the corresponding axis.The RC permits us to know what variables are more related to each axis (Axis 1 and Axis 2) and, therefore, allow us to know the variables involved in the order of the individuals on the projections in each axis.Because the axes are built to be independent, the RCs of each axis to each of the variables are independent and then it is possible to calculate the RC of a plane adding the RCs of the axes that form the plane.
Properties of the markers in the JK biplot.In this biplot, we use the metric B B = I, such that: • The scalar products of the individuals of X with the identity metric are the scalar products of row markers included in A for the full space XX = AA .
• The ED between two individuals of X and the ED between row markers in the full space are the same, that is, • The row markers and the individual coordinates are equal in the PC space, that is, if Ψ is a matrix containing the individual coordinates in the PC space, then • The column coordinates of X are the projection of the original axes onto the PC space, that is, the projection of each row marker onto column markers is an approximation of individual values on corresponding variables.
• The quality of representation of the rows is better than the columns.
Properties of the markers in the GH biplot.In this biplot, we use the metric A A = I, such that: • The scalar products of the columns of X are the scalar products of the column markers X X = BB .
• If X has been centered by columns, the squared length of the vectors representing column markers approximate the covariance of the corresponding variables and as consequence the three following properties arise: -The squared length of the column vector approximates the variance of the corresponding variable, whereas the length of the vector approximates the standard deviation (SD) of these variables, that is, -The cosine of the angle formed by two column markers approximates the correlation between the corresponding variables, that is, -The ED between two variables is the ED between the corresponding column markers, that is, Revista Colombiana de Estadística 37 (2014) 367-397 -The coordinates in B are the importance of the variables on the principal axes.
• The Mahalanobis distance between two individuals can be approximated by the ED between row markers, that is, by , where Σ is an estimate of the corresponding variancecovariance matrix.
• If X is centered by columns, the row marker coordinates are the individual coordinates in the PC space and then A contains the scores on the standardized PCs.
• The scalar products of the row markers are the scalar products of the rows of X with the metric (X X) −1 in the column space, that is, X(X X) −1 X = AA .
• The quality of representation of columns is better than that for the rows.
Properties of the markers in the HJ biplot.In this biplot, the properties of row markers are the same as in the JK biplot, whereas the column markers are the same as in the GH biplot.The rules for interpreting the HJ biplot are a combination of the rules used in classical biplots, CA, factor analysis (FA) and multidimensional scaling.Specifically, we have that: • The distances between row markers are interpreted as inverse similarities, in such a way that closer individuals are more similar, which allows the clusters of individuals with similar profiles to be identified.
• The lengths of the column vectors approximate the SD of the variables.
• The cosines of the angles between the column vectors approximate the correlations between variables.Hence, small acute angles are associated with highly positive correlated variables; obtuse angles near to the straight angle are associated with highly negative correlated variables; and right angles are associated with non-correlated variables; analogously the cosines of the angles between the variable markers and the PCs approximate the correlations between them, whereas for standardized data they approximate the factor loadings in FA.
• The location in the plot of the orthogonal projections of the row markers onto a column marker allows us to approximate the ranking of the row elements in that column.Thus, as the projection of a point (individual) away from the center of gravity (average coordinate point), the value that this individual takes on the variable is farther from its mean.
• Row and column markers can be shown in the same Cartesian system with optimal quality of representation.In the CA context, Greenacre (1984) and Lebart, Morineau & Piron (1995) proved that the clouds of row and column points have the same eigenvalues and barycentric relationships between them exist.The relationships proposed by Galindo (1986) are similar, that is, the relations between the eigenvectors U and V are U = XV D −1 and V = X U D −1 .Hence, the markers can be written as Therefore, the row coordinates are weighted means of the columns where the weights are the values of X and the same applies for columns.

Goodness of Fit
To assess goodness of fit in S dimensions, we need to know the variability proportion of X explained by the approximated matrix X, that is, the proportion of total variability Because of the least-square properties of the SVD and the orthogonormality of U and V , this total variability can be split into an explained variability and a residual variability expressed in terms of the squared singular values as where S is the rank of X.This expression shows that the sum of the first S squared singular values divided by the total sum of squared singular values is a way to assess the amount of total variability explained by the first S vectors.If the explained total variability is large, it means that the graph represented by the first S singular vectors has a good representation of the initial matrix.If only a small proportion of such a variability is explained by the first singular vectors, the rest of variability can be explained by vectors of higher dimensions.If the data are centered by columns, individuals located near the origin may have measures close to the variable means, or their variability is explained by higher dimensions.In the same way, variables located near the origin may have small variability or may be not well represented in these dimensions.The estimates of row and column markers for each biplot and their quality of representation are shown in Table 1.

. Contributions
The quality of representation detailed in Subsection 2.2 is a way to globally measure the fit of an approximation.However, it is also possible to individually measure its fit related to units and variables, which is important to interpret the results from the biplot.These measures are based on the concepts of RC or absolute contribution (AC) proposed in Galindo (1986) and Jambu (1991).The total inertia is the sum of the eigenvalues of a matrix, that is, the trace of the matrix, used as a measure of the total variability in a data matrix.It is directly related to the physical concept of inertia, which is the tendency of an object in motion to stay in motion, and the tendency of an object at rest to stay at rest.Note that the total variability of the individual cloud is equal to the total variability of the variable cloud, given by trace(XX ) = trace(X X) = S s=1 λ 2 s , where is .The ACs of the individual i and of the variable j to the variability of the axis s are AC is = a 2 is and AC js = b 2 js , respectively.The total inertia of the factor s taking into account the ACs of the individual i and of the variable j are respectively.The RCs of the elements i and j to the factor s are RC is = AC is /λ s and RC js = AC js /λ s , respectively, whereas the RCs of the factor s to the elements i and j are RC si = a 2 is d 2 (a i , 0) = cos 2 (a i ) and RC sj = a 2 js /d 2 (b j , 0) = cos 2 (b j ), respectively.The RC of the element to the factor measures how this factor can be explained by such an individual or variable.

A Biplot Methodology with Bootstrapping
In this section, we provide some aspects related to bootstrapping, propose a biplot methodology based on bootstrapping, discuss the state-of-art of the software developed for biplots and detail the features of the biplotbootGUI package.

Bootstrapping
Statistical theory attempts to answer three basic questions.(i) How should the data be collected?(ii) How should the collected data be analyzed and summarized?(iii) How accurate is this data summary?The third question constitutes part of the process known as statistical inference.Bootstrapping can help to answer this question when a sampling distribution is not available.Suppose a random sample X = (X 1 , . . ., X n ) is obtained from a population with unknown distribution.Let x = (x 1 , . . ., x n ) be an observation of X, from which we can obtain the estimate θ = s(x) of a parameter of interest θ, corresponding to an observed value of the estimator θ = s(X) for which we want to know its accuracy.A bootstrap sample x * = (x * 1 , . . ., x * n ) is defined to be a sample of size n with replacement from the observed sample x.A bootstrap replication of θ results from applying the same function s(•) to B bootstrap samples.To calculate the accuracy of the estimator θ, the bootstrap estimate of the corresponding SE, SE[s(X)] say, can be used.Its bias can be empirically calculated from B[s(X)] = s(x * ) − θ.Algorithm 1 summarizes the bootstrap method to calculate the mentioned SE, which is often used for constructing a CI for a parameter.Normal and t distributions-based CIs.Assume the estimator θ is normally distributed (at least approximately) with unknown expectation θ and SE known given by (Var Hence, the random interval [ θL , θU ] has probability 1 − α of containing the true value of θ.Thus, a 100 ].These results are meaningful for large enough sample sizes, for example, for n ≥ 25.However, if we have small samples (n < 25), these results still can be correct (Bickel & Krieger 1989), but inappropriate for n ≤ 5 (Chernick 1999).In addition, if SE[ θ] is unknown, we can estimate it with the expression given in Step 3 of Algorithm 1, ŜE[s(X)] say, but in this case Z = ( θ − θ)/ ŜE[s(X)] still follows, in an approximate way, for large enough sample sizes, a standard normal distribution.Otherwise (smaller size samples), we have Z = ( θ − θ)/ ŜE[s(X)] ∼ t(n − 1), that is, now Z is Student-t with n − 1 degrees of freedom distributed, but we need additionally the normality assumption for the population X.Thus, in this case, an 100 × (1 − α)% CI for θ with small sample sizes is [ θ ±t 1−α/2 (n−1) ŜE[s(X)]], where t 1−α/2 (n−1) denotes the (1 − α/2) × 100th quantile of the t(n − 1) distribution.
Bootstrap normal and t distributions-based CIs.The normal and t distributions do not adjust the CI for θ to account for skewness and/or other aspects that can result when θ is not the sample mean.The bootstrap normal and t CIs are procedures that adjust these aspects.Thus, by using the bootstrap method, we can obtain accurate CIs without having to make the normality assumption.This procedure approaches the population distribution directly from the data and builds CIs in the same way that we have explained in the cases of normal and t distributions.Algorithm 2 summarizes this procedure.
Bootstrap quantile-based CI.An alternative way to the bootstrap t distribution-based method (boot-t) for constructing bootstrap CIs is the quantile method (boot-q).The boot-t and boot-q methods are based on a simple structure.However, several data analyses involve more complex structures such as analysis of variance, regression models or time series.Boot-t and boot-q methods used for a more complex parameter than the mean were recently proposed by Leiva, Marchant, Saulo, Aslam & Rojas (2014) and can be adapted to data structures still more complex, as occurs with biplots; see Subsection 3.2.Algorithm 3 summarizes the boot-q method.

Biplots Based on Bootstrapping
We adapt Algorithms 2 and 3 to measure the accuracy of the following biplot parameters: (B1) goodness of fit; (B2) quality of the approximation for columns; (B3) eigenvalues; (B4) angles between variables; (B5) angles between variables and axes; (B6) RC to the total variability of the jth column element; (B7) RC of the column element j to the qth factor; and (B8) RC of the qth factor to the jth column element.Adaptation of Algorithms 2 and 3 is given in Algorithm 4.

Software for Biplots
Macros for biplots have been implemented in main commercial and non-commercial statistical software packages.Currently, most commercial statistical software packages include a procedure or macro for generating biplots; see details in Frutos et al. (2014).Specifically, the GGEbiplot software, dedicated to the GGE biplot (www.ggebiplot.com),can also generate the classical biplot.The GGEbiplot program is a commercial software and is widely used by agronomists, crop scientists and geneticists; see Yan & Kang (2003) and Frutos et al. (2014) and references therein.Vicente-Villardón (2010) implemented in the commercial software Mat-Lab (www.mathworks.com/products/matlab) a program to perform biplots called multbiplot.It contains classical, HJ and logistic biplots, among other biplots, as well as simple and multiple CA for contingency tables.
Most of the software available for biplots is developed for specific applications, or as part inside general purpose packages.Consequently, they are not very flexible and produce static pictures that limit the interpretation of their results.Tables 2  and 3 contain the main packages in R, which have implemented biplot decompositions and/or representations.In these tables, the name of the package, the approach on which it is based, that is, Gabriel (1971), Galindo (1986) or Gower (1992), the main references, the creation date and last update of the corresponding package are presented and its main contents and functionalities are discussed.
In Table 4, we provide a review of the R packages mentioning the word "biplot", although it refers to the joint representation of coordinates calculated with other methods instead of using the biplot decomposition.

The BiplotbootGUI Package
Because all of the packages (commercial and non-commercial) discussed in Subsection 3.3 are not suitable for constructing bootstrap CIs for biplot parameters (B1) through (B8), we developed a new package in the R language that combines the biplots described by Gabriel (1971) and Galindo (1986) and the bootstrap method to display results of these biplots and their statistical accuracy measures.
As mentioned, a GUI is a type of user interface which allows practitioners to interact with electronic devices such as computers.It is characterized by the use of icons and visual indicators, as opposed to text-based interfaces, typed command labels or text navigation, to fully represent the information and actions available to the user.The actions are often linked through direct manipulation of the graphical elements.Below, we discuss the features of a GUI in R language of the methodology for biplots based on bootstrapping proposed in the article and implemented in the biplotbootGUI package.
Revista Colombiana de Estadística 37 (2014) 367-397 It is a GUI to construct and interact with GGE biplots.It provides eigenvalues, % of variability explained by each of them, coordinates of individuals and variables, contributions of factors to elements.Also, this GUI allows us to change the background color, genotype labels, environments labels and title, font, graph title, in addition to showing genotypes and environments, as well as to hide title, axes and symbols.Furthermore, with this GUI it is possible to move the labels by the mouse button and change the color and text of labels.It provides tools for describing community ecology.This package has the basic functions of diversity, community ordination and dissimilarity analyses.In addition, it shows biplots from results of redundance, canonical correlation and canonical correspondence analyses, which can be used for other types of data as well.After the R software has been downloaded from cran.r-project.org and installed, the user must download and install the biplotbootGUI package and its dependencies, which are the rgl, tcltk, tcltk2, tkrplot and vegan packages; see Adler & Murdoch (2012), Grosjean (2012), Tierney (2012) and Oksanen et al. (2013).Then, to load the biplotbootGUI package into the R software, the command library(biplotbootGUI) must be entered at the R prompt.Once all these instructions have been followed, the data must be loaded.Hence, one starts the GUI by entering the command biplotboot(data) in the R console, where data to be analyzed must be in a data frame; see details and examples in Section 4 of applications.Once the GUI has been initialized, a window entitled "bootstrap on classical biplots" emerges; see Figure 3.This window allows us to enter the number of replications and the confidence level to calculate the CIs.Also, it is possible to choose the parameters to be considered by the user.After entering and selecting the parameters, one must click on the OK button and a window titled "Options" appears; see Figure 4.In this window the following options are available: • Select the type of biplot to be executed (HJ, GH or JK).
• Select the transformation to be performed on the data considering: -Subtract the global mean.
-Center by columns.
-Center by rows.
• Change the color, size, label and symbol representing individuals in the graph.
• Change the color, size and label representing variables in the graph.
• Show the axes in the graph.Given that not all the data are well represented by the first two axes, a window after clicking the button "graph" emerges with the option to choose the number of axes to be retained, according to the variability explained by each axis.After choosing the number of axes to be retained and clicking the button "choose", a window showing the resulting graph in 2D appears; see Figure 5.This window displays the labels for the two axes indicating the percentage of variability explained by each of them (72.96 by axis 1 and 22.85% by axis 2).The user can select the axes to be displayed in the graph.At the top of the window, two menus with options to save the graph and show the biplot in 3D are displayed, whereas three text boxes where the user can change the axes displayed in the graph are at the bottom.Also, the user can move or remove the label of a specific element by clicking the left-mouse button and change the graphical displays of such an element by clicking the right-mouse button.This window contains two dropdown menus.In the first one, options to copy, save the graph in different file formats (PDF, postscript, BMP, PNG, JPG/JPEG) or exit are available, whereas the second one provides a 3D-graph made by the rgl package; see Adler & Murdoch (2012).
The user can rotate or make zoom in this graph by clicking the left-mouse or right-mouse button.Together with this window, a graph showing the coordinates for variables computed for all of the replications is shown.The GUI provides two text files.In the first one, the parameters of the biplot analysis (see B1-B8) are saved, whereas in the second one, tables with the values for the mean, SE, bias and lower and upper limits of the bootstrap CIs are provided.These two text files are automatically saved together with all the graphs containing the histogram and quantile versus quantile (QQ) plot of the estimates calculated by bootstrapping of the selected parameters in the first window.In the histogram, the solid line represents the estimate of the biplot parameter obtained from bootstrapping, whereas the dashed line is its value obtained from the biplot.In the x-axis of the QQ plot are the theoretical quantiles and in the y-axis the empirical quantiles.

Numerical Applications
In this section, we evaluate the performance and potentiality of our methodology by means of the biplotbootGUI package using both simulated and real-world data.

Simulated Data
To evaluate the performance of the biplotbootGUI package, an HJ biplot with the transformation "centering by columns" has been performed.We simulate data of 100 individuals on 5 variables (V 1 , . . ., V 5 ) normally distributed, generated to have correlations Cor(V 1 , V 2 ) = 0.50, Cor(V 2 , V 3 ) = 0.80 and Cor(V 4 , V 5 ) = 0.90.The number of bootstrap replications is 1,000 and the confidence level 95%.The time involved in a bootstrap replication is usually small.For example, the time spent in the calculations of a 1,000×5 matrix is less than four minutes for 1,000 replications.First, we explain the main results of the classical biplot.In Table 5, we observe the variability explained by each axis (Axis 1, Axis 2 and Axis 3).Note that the first eigenvalue explains more than 50% and the first three axes explain more than 94.27% of the total variability.Table 6 shows the RCs of the factor to the column elements in the first three axes.Note that all the variables are well represented by the first two axes, except the variable V 1 , which is in the third axis.The biplot representation using the first two axes (Axis 1 and Axis 2) is shown in Figure 7(left).The covariation structure shows a very high correlation between the variables V 4 and V 5 , and V 2 and V 3 , represented by acute angles.Variables V 2 and V 3 have a high correlation with V 1 , however they present no correlation with V 4 and V 5 , since they are almost orthogonal; see Table 7.Second, we explain the results of applying the bootstrap method.Goodness of fit and eigenvalues are explained next.Figure 8 shows the histogram and QQ plot representing the values of the quality of approximation of 1,000 bootstrap replications.Table 7: Angles between variables for simulated data.We denote by "lower-t" and "upper-t" the lower and upper limits of the CIs based on the boot-t method, respectively, whereas these limits are denoted by "lower-q" and "upper-q" for the boot-q method.Table 11 provides the observed values for the mean, SE, bias and these limits.Notice that the observed value and its approximation are very close.These same results for eigenvalues are provided in Table 8. Figure 6 shows the histogram and QQ plot for the first eigenvalue (a similar behavior is observed for the other four eigenvalues, whose plots are omitted here, but are available under request for interested users).Note that the observed and estimated values practically do not differ and a similar conclusion is reached for the CIs.Each of the five eigenvalues resulting from the SVD of the simulated data shows the values calculated by 1,000 bootstrap replications.

Real-World Data
To illustrate the potentiality of the biplotbootGUI package, we use real-world data collected by Anderson (1935) and contained in the R software, which can be loaded once the user installs it.The data set corresponds to the measurements in cm of the variables: sepal length (Y 1 ) and width (Y 2 ) and petal length (Y 3 ) and width (Y 4 ), for 50 flowers from each of three species of iris.The species are iris setosa, versicolor and virginica.An HJ biplot with the transformation "standardize by columns" is performed.Once again the number of replications entered is 1,000 and the confidence level 95%.First, we show the main results of the HJ biplot.Table 9 presents the percentage (%) of variability explained by each axis, from where the first eigenvalue explains more than 70% and the first three axes explain almost the 100% of the total variability.Table 10 provides the RCs of the factor to the column elements in the first three axes.Notice that all the variables are well represented by the first axis, except the variable Y 4 , which is well represented by the second axis.The biplot representation using the first two axes is shown in Figure 7(right).The covariation structure shows a very high correlation between Y 3 and Y 4 represented by an acute angle.Both variables have a high correlation with the variable Y 1 .However, there is no relation with Y 2 due to a right angle is obtained.Table 10 also explains the angles between variables in the plane representing the first two axes.Figure 8 shows the histogram and QQ plot representing the values of the quality of approximation of the 1,000 bootstrap replications.Table 11 provides the observed values for the mean, SE, bias, lower-t, uppert, lower-q and upper-q for simulated and real-world (iris) data.Note that there is no difference between the observed value and its approximation, whereas the endpoints of both intervals are similar.Table 12 provides the RCs to the total variability of the variables based on 1,000 bootstrap replications.Note that there are no differences between observed values and their estimates, whereas the width of the CIs is small suggesting a high accuracy of our methodology.Figure 8 shows the histogram and QQ plot for the RCs to total variability of the variable Y 1 .A similar behavior is observed for the other three variables, whose plots are omitted here, but are available under request for interested users.Revista Colombiana de Estadística 37 (2014) 367-397

Discussion and Conclusions
Factorial analysis techniques only provide to researchers point estimates for their results.In this work, we have proposed a methodology that combines bootstrap and biplots methods to calculate confidence intervals for the results from biplots in order to provide measures of their accuracy.This idea has been applied in several multivariate techniques that incorporate a singular value decomposition.Despite there are some packages in the R software to perform biplots, such as detailed in this paper, these packages only provide estimated results as point values and no information about their accuracy is available.For such a reason, we have developed a new package in this software to implement our methodology.Specifically, in this paper, we have proposed a graphical methodology based on confidence intervals for the main parameters of biplots based in bootstrapping.These parameters help to interpret the contribution from the elements and axes of the biplot and correspond to goodness of fit, quality of the representation, and variability and relationships among variables.The proposed methodology may be considered as an inferential version of classical biplots and has been implemented in the new biplotbootGUI R package.We have detailed the features of this package and validated our methodology with numerical applications based on simulated and real-world data.The numerical results have shown the good performance and potentiality of our methodology, as well as the simple and easy manner to work with the biplotbootGUI package.As a supplement to our work, we have also provided a review on the key theoretical contributions and the computational implementations for biplot methods, covering the period from 1971 to the present.
Other ways to calculate measures of accuracy, such as jackknife, Markov chain Monte Carlo and permutation methods, are proposed in the literature, as well as ways to calculate confidence intervals other than the intervals proposed in this paper.In future works, some of these methods may be considered by us to provide different measures of accuracy for the results obtained by biplots methods.

Revista
Colombiana de Estadística 37 (2014) 367-397 biplot in Figure 2 shows a strong relationship between the variables b 4 and b 5 , and a weak relationship between the variables b 2 and b 3 , and between b 1 and b 3 .The correlation between the variables b 3 and b 6 is negative.The variables with the same direction involve multicollinearity, such as observed in Figure 2 for variables b 1 and b 2 .Also, biplots show multivariate outliers that can be used to detect clusters, such as the group formed by individuals a 1 , a 2 , a 7 and a 9 .

Figure 2 :
Figure 2: Biplot representation of a matrix with 6 variables and 9 individuals.

Algorithm 4
Adaptation of Algorithms 2 and 3 1: Follow Steps 1 and 2 of Algorithm 1 obtaining the bootstrap replications θ * 1 , . . ., θ * B .2: Calculate the empirical mean, SE and bias of the estimator θ with the bootstrap samples by using the expressions Ê[s(X)] = B b=1 s b (x * )/B, ŜE[s(X)] = ((1/B) B b=1 (s b (x * ) − s(x * )) 2 ) 1/2 and B[s(X)] = s(x * ) − s(x), respectively.3: Establish boot-t CIs for the parameters (B1)-(B8) with Step 4 of Algorithm 2. 4: Determine boot-q CIs for the parameters (B1)-(B8) with Step 4 of Algorithm 3. Revista Colombiana de Estadística 37 (2014) 367-397 provides a GUI to construct and interact with multibiplots.It allows us to obtain the quality of representation, contributions, goodness of fit, eigenvalues and possibility of selecting the number of axes.It shows 2D-and-3D graphs (2D graph moves or removes labels, changes color, size and symbol of the points and selects the axes shown in the graph; 3D graph rotates and makes zoom).29-10-12RevistaColombiana de Estadística 37 (2014) 367-397 by the implementation of graphical and statistical functions, availability of numerical data and writing of technical and thematic documentation.It includes bibliographic references and has functions to show biplots from results of the implemented analysis.Colombiana de Estadística 37 (2014) 367-397

Figure 5 :
Figure 5: Window with a biplot representation in two dimensions.

Figure 6 :
Figure 6: Histogram (left) and QQ plot (right) for the first eigenvalue of the simulated data SVD.

Figure 8 :
Figure 8: Histograms (left) and QQ plots (right) for quality of approximation with simulated (1st panel) and iris (2nd panel) data sets and RCs to total variability of the first variable presented with iris data (3rd panel).

Table 1 :
Markers and their quality of representation.

Table 2 :
Biplots in R

Table 5 :
Eigenvalues and variability % explained by each of them with simulated data.

Table 6 :
RCs of the factors to the column elements for simulated data.

Table 8 :
Results for the eigenvalues with simulated data.

Table 9 :
Eigenvalues and variability % explained by each of them for iris data.

Table 10 :
RCs of the factors to the columns and angles between variables for iris data.

Table 11 :
Results of the approximation quality for the indicated data set.

Table 12 :
Results of the contributions to the total variability for iris data.