Prediction of Stress Increase at Ultimate in Unbonded Tendons Using Sparse Principal Component Analysis

While internal and external unbonded tendons are widely utilized in concrete structures, an analytical solution for the increase in unbonded tendon stress at ultimate strength, $$\Delta f_{ps}$$Δfps, is challenging due to the lack of bond between strand and concrete. Moreover, most analysis methods do not provide high correlation due to the limited available test data. The aim of this paper is to use advanced statistical techniques to develop a solution to the unbonded strand stress increase problem, which phenomenological models by themselves have done poorly. In this paper, Principal Component Analysis (PCA), and Sparse Principal Component Analysis (SPCA) are employed on different sets of candidate variables, amongst the material and sectional properties from a database of Continuous unbonded tendon reinforced members in the literature. Predictions of $$\Delta f_{ps}$$Δfps are made via Principal Component Regression models, and the method proposed, linear models using SPCA, are shown to improve over current models (best case $$R^{2}$$R2 of 0.27, measured-to-predicted ratio [λ] of 1.34) with linear equations. These models produced an $$R^{2}$$R2 of 0.54, 0.70 and λ of 1.03, and 0.99 for the internal and external datasets respectively.


Introduction
The use of unbonded tendons, either internal or external, increases cost-efficiency, provides aesthetic satisfaction for users, and achieves fast and efficient construction (Cooke et al. 1981;Naaman 2005;Roberts-Wollmann et al. 2005). However, analysis of structures using unbonded tendons is exceptionally difficult and has been the subject of many international research projects, most of which attempt to simplify the problem considerably. Although numerous studies have been conducted to estimate the tendon stress increases at nominal strength, the analytic solution for the increase in unbonded tendon stress ( f ps ) is challenging due to the lack of bond between strand and concrete, and most analysis methods do not provide high correlation due to the limited available test data .
Current design for unbonded tendon reinforced members in the United States uses American Concrete Institute 318 (ACI 318) (ACI 2008): or American Association of State Highway and Transportation Officials Load and Resistance Factor Design (AASHTO LRFD) (AASHTO 2010) guidelines: Both of the above methods are relatively easy for implementation in design. However, there are concerns with both. The ACI model is a curve fit to statistical data from only a handful of experimental data prior to 1978 (Mojtahedi and Gamble 1978;Mattock et al. 1971 (2) f ps = 6200 d ps − c L 1 + N 2

International Journal of Concrete Structures and Materials
Page 2 of 18 McKinney et al. Int J Concr Struct Mater (2019) 13:20 AASHTO method is not dependent on an experimental curve fit for f ps , but is dependent on an estimation of the scaled plastic hinge length (ψ) from Tam and Pannell (1976). The ACI method especially is well liked by designers due to its simplicity for design. There are considerably more prediction methods available in the literature as well as international design codes. Maguire et al. (2017) performed an indepth review of various prediction methods based on the common mechanisms and empirical assumptions. The collapse mechanism model uses the relationship between strain, angle of rotation and applied load. The AASHTO LRFD method based on Roberts-Wollmann et al. (2005) and MacGregor (1989) is considered a collapse mechanism model. Other collapse mechanism models have been developed by the British Standard Institution (BSI 2001) and Harajli (2011) among others. Another category, called bond-reduction models, calculates a bond-reduction coefficient (Ω) to reduce the strength of a cross section unbonded reinforcement. Probably the most well-known bond reduction model was introduced by Naaman and Alkhairi (1991) and at one time was accepted in the 1994 AASHTO LRFD code, but later replaced in the 1998 AASHTO LRFD and also included statistical fitting to some degree. Alternatively, statistical analysis methods have been developed using the available experimental data of their time. The 1963 ACI code (ACI 1963) and European design codes, including German (DIN 1980) and Swiss (SIA 1979) codes, are widely accepted for design and real world application, and are statistically based. The 1963 and current ACI methods purposely under-predict strand stress increase in most cases and when compared to other methodologies provide closer to a lower bound prediction as opposed to an accurate prediction. Maguire et al. (2014Maguire et al. ( , 2017 indicated considerable phenomenological difference between Continuous unbonded tendon reinforced members, which are common, and simply supported members, which are uncommon in design. Interestingly, most methods from the literature compared prediction performance to a majority of simply supported members. In response, Maguire et al. (2017) compiled the largest known international database of 83 Continuous members, illustrating the dearth of data on this subject. This database only contains tests that have vetted and valid test setups and strand stress measurement. Considerable discussion was made to make clear the reasons for inclusion or exclusion of many test programs and even outlines future experimental needs. In order to consider multiple variables including internal and external tendons, Maguire et al. (2017) also suggested an update to the AASHTO LRFD collapse mechanism model (ψ = 14 and ψ = 18.5 for internal and external tendons, respectively) based on statistical analysis and found nearly all types of prediction methods to have very low prediction accuracy with best case fit statistics R 2 of 0.27 and a best measured-to-prediction ratio (λ) of 1.34, neither of which indicates ideal prediction.
With the overall lack of available data and targeted research programs to drive improved phenomenological models for unbonded tendon reinforced structures, a statistical approach may provide the best prediction for f ps (McKinney 2017). The advantages of a statistically based model are clear. Like the ACI equation, statistical models can be easily implemented, do not require excessive design time, and do not burden the engineer with several design iterations (e.g., bond reduction and collapse mechanism models). Furthermore, they can be optimized to fit the data and cross validation used to verify their accuracy.
The aim of this paper is to use advanced statistical techniques to develop a solution to the unbonded strand stress increase problem, which phenomenological models have done poorly . While many engineers would prefer a phenomenological model, many also have affinity for the purely empirical ACI equation, which does not require complicated analysis, but has noted shortcomings. In this paper the authors present a novel approach to predict the increase in tensile strength in unbonded tendons using Principal Component Analysis (PCA), and Sparse Principal Component Analysis (SPCA). PCA is a statistical procedure to select significant variables by converting the variable information into the orthogonal base set (Jolliffe 2002). PCA has gained considerable popularity in structural engineering in recent years in combination with machine learning and structural health monitoring (Yan et al. 2008;Zhang et al. 2014) vibrations (Kuzniar and Waszczyszyn 2006;Hua et al. 2007;Kesavan and Kiremidjian 2012;Zolghadri et al. 2016;Zolghadri 2017) and image based crack detection (Abdel-Qader et al. 2004) because it is especially useful for analyzing large dataset with many variables. SPCA uses the Least Absolute Shrinkage and Selection Operator (LASSO) to reduce the contribution of relatively insignificant principal coefficients in the proposed statistical model, which simplifies the model further (Zou et al. 2006;Chang et al. 2017). Ultimately, the LASSO technique identifies the most important variables from a larger set in order to develop the most effective prediction equation with limited human influence.
The experimental and analytical literature is somewhat mixed on what the most important variables are for predicting tendon stress increase. Hemakom (1970) and Gebre-Michael (1970) tested five Continuous, oneway, slabs varying concrete strength the level of prestress, prestressing reinforcing ratio and pattern loading. They Page 3 of 18 McKinney et al. Int J Concr Struct Mater (2019) 13:20 found the percentage of prestressed reinforcement varied inversely with f ps , while concrete strength varied directly with f ps , while the level of effective prestress had no effect. Chen (1971) performed similar tests on two, one-way, slabs and found the distribution of cracks and moment capacity of the member were increased by including bonded reinforcement. Trost et al. (1984) found the main factors influencing their experiments were compressive strength of the concrete and the level of prestress, and that f ps was proportional to the sum of the deflections at the critical sections, while span-to-depth ratio was insignificant. Harajli and Kanj (1991) tested 26 simply supported beams with internal unbonded tendons. Beams varied span to depth ratio, loading, mild and prestressing reinforcement. This study found that as the mild reinforcing ratio decreased, the f ps increased. Additional observations were that the type of loading (single point load or third point loads) and the span-to-depth ratio (ranging from 8 to 20) did not affect tendon stress increases, contradicting many analytical and experimental studies (Mojtahedi and Gamble 1978;Naaman and Alkhairi 1991;Lee et al. 1999). Harajli et al. (2002) performed tests on nine, twospan Continuous, externally pretressed beam members and found that the geometry of load within a span, area of external prestressing steel and second order effects reduce f ps . A reduction in steel stress with increase of span-to-depth ratio was also noticed and attributed to its influence on plastic hinge length and rotation capacity. Lou and Xiang (2006), validated a finite element model on the Harajli and Kanj (1991) dataset. This numerical investigation found that a significant increase in f ps can be found with an increase of yield stress of the bonded reinforcement. Furthermore, the stress increase was shown to decrease significantly with an increase of the combined reinforcing index, but this was attributed to the change in mild steel reinforcing index, verifying similar behavior from Du and Tao (1985). Ozkul et al. (2008) performed an experimental investigation of 25 simply supported members with internal unbonded tendons. The experimental results showed effective prestressing and area of prestressed reinforcement, but mild steel and concrete strength were not important even though plastic hinge lengths were affected by the mild steel provided. There was an inverse relationship noted between f ps and the prestressed reinforcement indices that was attributed to sharing of tensile force between prestressed and nonprestressed reinforcement. Lou et al. (2013) in a numerical investigation, calibrated a FEM to two-span members tested by Harajli et al. (2002) indicated that f ps in external tendons of Continuous beams is most strongly related to rotational capacity and non-prestressed reinforcement.
The above summary of experimental and analytical literature conflicts on nearly every investigated variable. The reason for this is likely the relatively focused nature of their investigations. In order to identify the variables that are most important, this paper uses the LASSO technique with SPCA to identify the variables of most importance from a large dataset.
This paper focuses on improving the accuracy of f ps predictions for internally and externally reinforced unbonded tendons separately. Sets of candidate variables, amongst the material and geometric properties from the database compiled by Maguire et al. (2017), are considered to analyze the significant factors in the database for prediction of f ps , and to construct models. It is acknowledged that variables like deviator type and location are important to the prediction of design, but since this information is not present in the database, for the purposes of this investigation, second order effects are neglected. The performance of all of the PCA models are compared against a benchmark PCA model involving all of the variables. Likewise, the authors compare the SPCA models to a SPCA benchmark. Additionally, these predictions are compared to other prediction methods from the literature on the same database. The results show that improvements in predictions can be made with a simplified SPCA regression model.

Principal Component Analysis (PCA) and Sparse PCA (SPCA)
PCA is a widely used statistical technique for dimension reduction. It takes linear combinations of all of the variables to create a reduced number of uncorrelated variables (called principal components, or PC's) that still express a majority of the information from the original data (Lattin et al. 2003). The number of principal components selected, which is usually much smaller than the number of original variables, is determined by considering how much information is retained at the cost of simplifying the data. In addition to dimension reduction, another typical scenario where PCA works well is when a level of collinearity exists in the data, i.e., some or all of the predictor variables are correlated. After applying PCA, the resulting principal components are uncorrelated, and hence the replication of information in the original variables is removed. Let X = x ij , i = 1, . . . , n , j = 1, . . . , p , be the n × p data matrix of n observations on the p-dimensional random vector X = X 1 , X 2 , . . . , X p T . Define the 1 × p mean vector x as That is, the jth element of x is the sample mean of the jth variable. The p × p sample covariance matrix S is computed as where 1 n is an n × 1 column vector of ones. Let 1 ≥ 2 ≥ · · · ≥ p be the eigenvalues of S in descending order, and let u 1 , u 2 , . . . , u p be the corresponding eigenvectors. The first principal component Y 1 is defined as a linear combination of X j 's such that it has the largest variance under the constraint that the coefficient vector has unit norm. It turns out that the coefficient vector, which is called the loading of Y 1 , is estimated by u 1 , the eigenvector of S corresponding to the largest eigenvalue 1 . The second principal component Y 2 is the linear combination of X j 's with the second largest variance under the unit norm constraint uncorrelated with Y 1 , and the loading of Y 2 is estimated by u 2 . In general, the kth principal component is computed as Subsequent analyses are usually performed based on these q uncorrelated principal components (as opposed to the original p variables), whose observed values are given by the principal component score matrix here, U = u 1 , u 2 , . . . , u q is the p × q loading matrix. To mitigate the effect of scaling, it is a common practice to standardize the variances before performing a PCA. In such a situation, the sample correlation matrix ρ is used in replacement of the sample covariance matrix S , where D is the diagonal matrix of the diagonal entries of S , i.e.
It is equivalent to using the sample covariance matrix when the variances of all variables are standardized to be 1.
One major drawback of PCA is that each principle component is a linear combination of all of the predictor variables, which often makes the results difficult to interpret. To address this problem, Zou et al. (2006) proposed the Sparse Principal Components Analysis (SPCA) as an alternative shrink some of the coefficients to 0 by producing a sparse estimate of the loading matrix via the technique of penalized regression. Technically, this is done by expressing PCA as a regression problem with a quadratic penalty, which essentially forms the ridge regression: here, A = α 1 , α 2 , . . . , α q and B = β 1 , β 2 , . . . , β q are two p × q coefficient matrices, and �·� denotes the Euclidean norm. The normalized vector of β k gives the approximation to the loadings of the kth principal component, i.e., Then, an L 1 or Lasso penalty (Tibshirani 1996) is added to the optimization criterion to induce sparsity, i.e., shrink some of the coefficients to 0. Thus, the sparse PCA is formulated as where �·� 1 denotes the L 1 norm, i.e., summation of the absolute values of the elements. The constants λ and k , k = 1, . . . , q are tuning parameters, of which λ k 's are associated with the Lasso penalty and control the amount of shrinkage, i.e., how many coefficients are shrunk to 0. Smaller values of λ k induce more 0's in β k . Fitting of SPCA can be carried out in the software R using the package elasticnet (see Zou and Hastie 2005). As a remark, due to the induced sparsity in SPCA, the resulting loadings deviate from being orthogonal, and consequently, the corresponding sparse PCs are no longer guaranteed to be uncorrelated (Zou et al. 2006). However, engineers will likely willingly trade off PCs being uncorrelated for improvements in simplicity and predictive accuracy.

Principal Component Analysis Application
The unbonded tendon data are split into internally reinforced (internal) and externally reinforced (external) subsets each possessing 17 predictor variables and the response variable, f ps . The 15 predictors contained in the database are included in the analysis as well as two additional variables, v ACI and v AASHTO , which are the variable parts to the ACI and AASHTO prediction equations (ACI 2008;AASHTO 2010). These are included in the analysis in an attempt to build upon any already discovered explained variation in the data. The ACI (11) variable part is well known for being inaccurate, whereas the AASHTO variable part is highly phenomenological and some variation is included in many design codes around the world.
The internal data has 182 observations, and the external data has 71. The variable names and type, as they are typically defined for statistical analyses (Nowak and Collins 2012), are found in Table 1. The only Categorical data type is the LT variable, which is 1, 2 or 3 for single point loading, third point loading or uniform loading. Both data subsets exhibit multicollinearity among predictors in their respective sample covariance matrices suggesting repetition of information. Due to the wide variation in scale of the different variables, the correlation matrix is chosen over the covariance matrix for the PCA.
Because variable selection is not handled by the LASSO operator as it is with SPCA, multiple approaches were used in selecting important variables for the PCA. The initial approach consisted of merely assuming that all 17 variables were important. An Eigen-decomposition was applied to the correlation matrix using Eq. (7) to calculate the PCs. Figure 1 consists of scree-plots showing the proportion of variation and cumulative proportion of variation explained by each principal component for their respective data subset. An 'elbow' , or change in slope between PCs (Jolliffe 2002), in the scree-plot suggest good choices for the number of PCs that express the most information while keeping the model simple, e.g. the elbow seen at three PCs in Fig. 1a. However, five principal components are selected for both the internal and external data as a means to compare models, and since five PCs capture a majority of proportion of variation in the data, while keeping the models relatively simple. The cumulative proportion of variation for 5 PC's is 0.80 for the internal tendons, and 0.84 for the external tendons.
From the five selected principal components, linear combinations of the 17 variables can now be expressed as five new uncorrelated variables. Then with tenfold cross validation, linear models are then fit to the data using the five new variables. As criterion of how well the models are fitting the data, the coefficient of determination R 2 , adjusted R 2 a , average ratio of measured vs. predicted responses , root mean squared error (RMSE), and the mean absolute error (MAE) are calculated for each model (Kutner et al. 2004). R 2 is the ratio of the explained variation made by the model over the total variation in the data, defined as: where y i is the ith predicted f ps , y i is the ith f ps , and y is the sample average of f ps . Adjusted R 2 is similar to R 2 but it is penalized for more complicated models that involve more predictors. It is calculated as follows: where p is the number of predictor variables used in the model plus one. is calculated as the mean of all of the ratios of f ps values and their corresponding linear model predicted values, f ps , i.e.
A visualization related to is seen in Fig. 2  A second approach was attempted by handling the Continuous and Categorical variables separately. While all of the variables are continuous except LT, the variables E ps and d ′ s behaved as Categorical in the data and are treated as such (see Table 1). A separate PCA was computed for the 14 Continuous variables and the 3 Categorical variables within each data set. In order to keep the same number of (14) overall PC's in the final models, four PC's are chosen for the Continuous variables, and one is chosen for the Categorical variables as seen in Fig. 1b Table 2. Plots for measured vs. predicted f ps are also included in Fig. 2b. Again, the four previously calculated PCA linear models suffer due to the fact that each principal component is a linear combination of all predictor variables, which is not ideal for structural design. Variable selection restricting only important variables into the PCA would allow for simpler linear models with possibly better predictive power. Two additional subsets of the original variables are considered and a model selection technique was employed and compared to the initial analysis. The first set of selected important variables is decided through professional suggestion. The authors call this set the "Self-Selected" set. The second set, called the "Correlation Cutoff" set, was selected by a test of minimum linear correlation with f ps . Subsequent PCA linear models are then computed for all possible combinations of PC's as predictors, statistical significance is assessed on the coefficients via t-tests, and the final models chosen are those which achieve the highest R 2 a . The Self-Selected important variables are L , h , A ps , f ′ c , A s , A ′ s , f pe , and f ps based on the literature and experience. After a PCA is applied to these variables the data is reduced from only seven predictor variables to five. While this is not a gain of much more simplicity to the models, the correlation between the predictors is removed. The scree plots in Fig. 1d again show that most of the information is expressed in the first five PC's chosen.
While there is a noticeable gain in cumulative proportion of variance explained by these 5 PC's in both data sets (0.89 for the internal data, and 0.98 for the external data), the final models, called PCA-SS-Int and PCA-SS-Ext, do not make similar gains in modeling the data, as seen by their respective R 2 = 0.26, 0.49, R 2 a = 0.25, 0.48, = 1.04, 1.04, RMSE = 126.34,198.51,and MAE = 103.43,160.44 values. A lack of fit to the data is seen in Fig. 2c by the models tendency to over predict for lower values of f ps and to under predict for higher values of f ps . This process is repeated for the Correlation Cutoff set as well. However, these variables were selected by first examining their respective linear correlations with f ps . While simply selecting predictors with a significant amount of correlation with the response does not consider collinearity among predictors, the subsequent PCA handles this by producing uncorrelated PC's, likewise for SPCA. A Pearson's product-moment correlation test is applied with a level of significance set at 0.05. Table 3 contains the correlations and p-values for both internal and external data. McKinney et al. Int J Concr Struct Mater (2019) 13:20 Fig. 2 Measured f ps vs. predicted f ps (in MPa) from the PCA models using a all variables, and the b combined Continuous and Categorical, c Self-Selected, and d Correlation Cutoff variable subsets.
Page 9 of 18 McKinney et al. Int J Concr Struct Mater (2019) 13:20 Interestingly, Table 3 indicates that for internally bonded tendons, the length is not important, which Mojtahedi and Gamble (1978), among others, indicate is important. Concrete strength is not considered important, although it shows up in the ACI code, and several, and the current ACI code. The variables b , d ps and A ps are considered important and are also considered in the ACI code as the prestressing reinforcing ratio ( ρ ps ). Interestingly, f y is considered important although it is not included in any known prediction model, and conversely, A s is not considered important contradicting several experimental studies.
Additionally, Table 3 indicates that there are considerable differences in the significance of many variables. Most notably is the 0.77 correlation between v ACI and f ps , as compared to the 0.42 correlation for the internally bonded tendons. There is agreement on several variables, for instance, the loading type, depth of section ( h and d ps ) and A ps are considered important while d s , d ′ s and E ps are not considered important in both sets. However, the remaining variables are in contention. For instance, length is considered important in the external dataset as is concrete strength, f pe and A s , but not f y . Interestingly, A ′ s is considered important in the external dataset. Furthermore, h , f pu , A s and f y were found to have opposite effect (see difference in signs in Table 3) on the behaviour, indicating either very different phenomenological effects or shortcomings in the dataset.
The dataset itself is made of all of the available experimental data, but the dataset is also shaped by the experimental needs. Externally reinforced members tend to be larger bridge girders with higher reinforcing ratios and, often, A ′ s . The make-up of the externally reinforced dataset reflects this and contains more beam-like members (higher d ps , h , A ps , A ′ s etc.), many  of them simulating bridge girders. The internally reinforced dataset is made up of many more slab like members that do not contain compression steel and are smaller, some of which are scaled (Burns et al. 1978;Six 2015). Regardless, one should be aware that the dataset, while the largest available, does contain limited numbers and limited variations for many variables. From this analysis, it is unclear if the difference in variable importance is due to the dataset or phenomenological differences. The analysis does seem to dispute the use of the same equation for internal and external members (like ACI and AASHTO) and indicates that predictions that somehow account for the difference may be better (like Maguire et al. 2017;Harajli 2011). If a variable exhibited significant correlation (p-value less than 0.05) with f ps it was kept for subsequent analysis. The correlation cutoff variables for the internal data are v ACI , v AASHTO , LT , h , b , d ps , A ps , f pu , and f y , and the correlation cutoff variables for the external data are v ACI , v AASHTO , LT , L , h , d ps , A ps , f pu , f ′ c , A s , f y , A ′ s , and f pe . The scree plots in Fig. 1e show a cumulative proportion of variation for the internal data is 0.93, and 0.94 for the external data.
By using Pearson's product-moment correlation test to remove variables that exhibit low correlations with f ps , applying a PCA on the remaining predictors, and then using model selection the linear models tend to model the data better as seen in their respective R 2 = 0.52, 0.67, R 2 a = 0.50, 0.66, = 0.99, 1.01, RMSE = 102.36, 160.93, and MAE = 81.09, 123.49 values (see Table 2). Due to the PCA predictions resulting in very long and cumbersome equations, even when simplified (as they load all 15 of the explanatory variables), they are not presented here. However, they can be constructed using the PC loadings presented above in the PCA section.

Sparse Principal Components Application
SPCA was applied to both internal and external data sets on all of the subsets of variables producing eight additional linear models called SPCA-All-Int, SPCA-All-Ext, SPCA-ContCate-Int, SPCA-ContCate-Ext, SPCA-SS-Int, SPCA-SS-Ext, SPCA-CC-Int, and SPCA-CC-Ext (see Table 4). In Table 4, the italic numbers indicate the selected models for the respective data datasets. In all of these cases, a decision must be made about how much sparsity is desirable. Again, sparsity in the Principal Components is the reduction of some of the coefficients, or loadings, for the linear combinations of the predictor variables to zero.
In applying SPCA to all of the variables, Fig. 3 reveals optimal choices for the number of sparse coefficients per PC by maximization of R 2 a . Note that the variation in the external subset is being explained significantly better by the data than the internal subset as seen by the consistently higher R 2 a (Fig. 3a, b, d, e). However, Fig. 3c shows little variation in data being explained by the variables that were treated as Categorical variables. More specifically, Fig. 3a suggests 2 and 1 non-zero loadings (for each SPC) for the internal and external data respectively. The sparse loadings for all of the SPCA models are represented by heat maps found in Fig. 4 Table 4). Lastly, as in the PCA comparisons are made between measured and predicted f ps as seen in Fig. 5a.
Due to the PCA predictions resulting in very long and cumbersome equations, even when simplified (as they load all 17 of the explanatory variables), they are not presented here. However, their SPCA versions are produced and explicitly listed in the following section.
Furthermore, the following results of applying SPCA to the Continuous and Categorical, Self-Selected, and Correlation Cutoff subsets are similarly recorded and compared to the previous analysis. For each the number of non-zero loadings per SPC are calculated (see Fig. 3), model selection is evaluated, heat maps of the sparse loadings are produced (see Fig. 4), and the R 2 , R 2 a , , RMSE, and MAE values are recorded (see Table 4). These linear models are listed explicitly with their respective linear combinations for each SPC. While the models are shown here with their respective PCs, with some algebraic manipulation alternative versions of the final suggested models are presented in the following section. It should be noted when SPCA is applied to the Correlation Cutoff variables that ten variables were retained for the external data, while only nine   Figs. 2,5). Some of this is also exhibited in the external data though not as strongly. This suggests that an underlying non-linear relationship may be present in the data, and suggests further analysis possibly involving more advanced models. Most notably, the R 2 , R 2 a , , RMSE and MAE values are 0.54,0.53,1.03,99.53,and 78.04 for the internal correlation cutoff SPCA model,and 0.70,0.69,0.99,152.79,and 110.93 for the external model involving all of the variables (see italic values in Table 4). Notice that while the difference in increased R 2 and R 2 a for the SPCA-CC-Int model is 0.08 and 0.07, a noticeable amount, the SPCA-CC-Ext model does not improve over the initial SPCA for all external variables (compare first and forth rows of Table 4). Furthermore, after the model selection process only two terms remain in both the SPCA-All-Ext and SPCA-All-Int models (Fig. 4b, h). Hence, while not as reduced as the external model, the most predictive accuracy for the internal data is in the suggested SPCA-CC-Int model. Whereas for the external data, the SPCA-All-Ext model is recommended, achieving both the highest predictive accuracy while producing a simplistic design. Many of the under and over predictions made by the ACI and AASHTO models are handled better by the SPCA-CC-Int and SPCA-All-Ext models (compare Fig. 5a External to Fig. 6a, b External. Also, compare Fig. 5d Internal to Fig. 6a, b Internal).
It should be noted that while the SPCA-All-Ext and SPCA-CC-Ext models both have two variables, with v ACI being in common, the other two variables ( A ps in SPCA-All-Ext and h in SPCA-CC-Ext) are not the same (Fig. 4). The reasoning for the difference is likely the fact that both A ps and h are highly correlated (specifically 0.93 correlation), and similar information is being expressed in each model through collinearity (Table 5).

Simplified Prediction Equation for External data on all of the Variables (SPCA-All-Ext)
Interestingly, v ACI was found by the SPCA technique to be beneficial to the external prediction equations, whereas the highly phenomenological v AASHTO , which takes into account hinging location, was found to be important to the internal model. This is not surprising since Maguire et al. (2017) found a calibrated version of the internal equation was most accurate, and the v ACI equation, while not intended when developed, predicts external members better than most other methods. Interestingly, the final SPCA prediction for external tendons relies only on the v ACI and A ps variables, of which the latter was often found as important by experimental studies.
Conversely, even after efforts to simplify through model selection, the final SPCA prediction for internal tendons contains seven variables including LT, which lends some phenomenological influence. Furthermore, v AASHTO is also present, which lends significant phenomenological influence. However, the other variables are several of those disputed by the literature.

Summary and Conclusions
The PCA and SPCA linear modeling is applied to study the relationship between f ps and a collection of variables. The method consists of two consecutive steps: creation of uncorrelated (sparse) principal components and linear regression with the principal components. Due to the uncorrelatedness of the PC's, variable selection for the linear regression is simple and straightforward. In fact, the PCA/SPCA is an important alternative to perform model selection, compared to the celebrated penalized regression, which requires intensive tuning to achieve optimal performances. Furthermore, the PC's also provide an insightful understanding of the relationship between the outcome and the original variables.
The data in Maguire et al. (2017) were separated into two data sets determined by internal or external tendons. Stochastic linear models based on PCA and SPCA were constructed as prediction equations for f ps . Eight resulting linear models involved all the available explanatory variables, of which four handled the Continuous and Categorical variables separately. The remaining eight models used only subsets of important variables, which were the Self-Selected, or Correlation Cutoff important variable subsets. Upon comparison, the linear models using SPCA on the Correlation Cutoff variables performed notably for internal tendons, and SPCA on all the variables performed significantly for the external tendons (see italic values in Table 4).
The following conclusions can be made from the above work: • External and internal members show different levels of importance for the variables within the dataset. For instance, only A ps was considered important to both internal and external predictions in the final SPCA equations. However, h , d ps , LT, f pu , v AASHTO and f y were all considered important to internal ten- dons, but none were important to external tendons. The reason for this is unclear, but is likely due to the differences in data contained in the dataset and phenomenological differences between the two structural systems. Interestingly, the influence of A ps is a near consensus from the literature, but the other variables are disputed.
• Based on the above conclusion and the surveyed experimental and analytical literature, there is a significant need for more data in order to obtain better understanding, statistically and phenomenologically, of unbonded tendon reinforced members. This is ideally accomplished through additional testing, as  the available database is relatively small compared to other member databases (e.g., Reineck et al. 2013). • The SPCA-CC-Int model produced an R 2 = 0.54, R 2 a = 0.53, = 1.03, RMSE = 99.53, and MAE = 78.04. • The SPCA-All-Ext model produced an R 2 = 0.70, R 2 a = 0.69, = 0.99, RMSE = 152.79, and MAE = 110.93. • While the PCA and SPCA models performed similarly, according to the R 2 and metrics, SPCA combined with model selection techniques results in considerably shorter equations and produced better fit statistics. • The PCA and SPCA analysis predicted significantly better than codified methods on the same dataset (R 2 = 0.16 and 0.08, = 1.85 and 2.01 for AASHTO and ACI respectively) and the optimized semiempirical model presented by Maguire et al. (2017) (R 2 = 0.27 and = 1.34). • The predicted stress increase, f ps , is consistently under predicted for higher measured values of f ps in the internal data (see Figs. 2,5). Some of this is also exhibited in the external data though not as strongly. This suggests that an underlying non-linear relationship may be present in the data, and suggests further analysis possibly involving more advanced models.

List of symbols
A ps : area of prestressing reinforcement (mm 2 ); A s : area of mild reinforcing steel on tension face (mm 2 ); A ′ s : area of mild reinforcing steel on compression face (mm 2 ); E ps : modulus of elasticity of the prestressing reinforcement (MPa); L : total span length (m); LT: loading type (1.0 for single point load, 2.0 for third point loading, 3.0 for uniform loading); b: beam width (mm); c: depth from compression fiber to neutral axis (mm); d ps : depth to prestressing reinforcement (mm); d s : depth to tension mild reinforcing steel from compression face (mm); d ′ s : depth to compression mild reinforcing steel from compression face (mm); f ′ c : concrete strength (MPa); f pe : effective stress in the prestressing reinforcement (MPa); f ps : stress increase in unbonded tendons (MPa); f ps : predicted stress increase in unbonded tendons (MPa); f pu : ultimate tendon strength (MPa); f y : yield strength of mild reinforcing steel (MPa); h: beam height (mm); N: number of internal supports crossed by the tendon; v ACI : variable part of the ACI prediction equation (MPa); v AASHTO : variable part of the AASHTO prediction equation; µ: 100 if L/d ps ≤ 35 , and 300 if L/d ps > 35; ρ ps : prestressed reinforcing ratio; ψ: scaled plastic hinge length.