Improvement of Side Resistance Prediction for Pile Foundation Using Construction Information

Problems such as the inclination and settlement of buildings after construction completion have been reported in recent years. Quality control during construction and mieruka (“visualizing” the work quality) are increasingly demanded. In light of this, the present study addresses execution with a “rotary penetration steel pipe pile (RPS-pile) ” , which is a system that is able to collect information in real time during the piling process. When a RPS-pile is placed, high push-in and pull-out bearing capacities are expected because the spiral blade at the pile tip resists at the bottom of the borehole against an external force. Additionally, continuous real-time data on the pile-head torque and the auger penetration depth per revolution are obtained, as these mechanical indicators are necessary for piling. This study aims at developing a method for the real-time conﬁrmation of work quality at construction sites by utilizing construction information (information obtained during piling). Speciﬁcally, a method is proposed for the sequential updating of reliability in the estimation of the side friction acting on a drilled pile. In this method, real-time information on the pile-head torque and the auger penetration depth per revolution obtained during piling are used in addition to N -values that are observed in a standard penetration test conducted in advance as part of subsurface exploration.

D r a f t Improvement of side resistance prediction for pile foundation using construction information D r a f t Because significant inclination and settlement of buildings after the completion of construction has been reported in recent years, the importance of quality control during construction has come to be widely recognized. In planning repairs and reinforcement that are required as a result of the deterioration of buildings or non-conformity caused by changes in laws and regulations, it is necessary to have a sound understanding of the initial conditions of the buildings after the completion of construction. Particularly with regard to subsurface structures, which are addressed in this study, piling is conducted without sufficient understanding of the geomaterials'characteristics, so efforts have been made to build structures in a reliable manner by means of an observational construction control system that utilizes gathered information. However, the information collected during construction is usually utilized for detecting abnormalities and not for quantitative assessment of the work quality or the structure performance. Therefore quality control during construction and mieruka ("visualizing" the work quality) are increasingly demanded. This problem is addressed in this study. Specifically, by utilizing rotary penetration steel pipe piles (RPS-piles), methods are examined for collecting basic information on the structure performance that is necessary for observational construction control and for decision-making regarding repairs and reinforcements after construction. For this purpose, information collected during construction is related to structure performance. A RSP-pile is a steel pipe pile with a spiral blade at the pile tip. Fig. 1 shows a schema of a RSP-pile. Torque given to the pile head helps the pile auger penetrate the ground. Because this piling method ensures low vibration and low noise and does not require earth removal, it can be implemented even on small plots of land, and the effects on the surrounding area are limited. The characteristics of RPS-piles are as follows: large push-in and pull-out bearing capacities are expected because the spiral blade helps to resist at the bottom of the borehole against an external force; and continuous data on the pile-head torque and the auger penetration depth, which are mechanical indicators necessary for piling, are obtained in real time (Mori (2003), Saeki and Ohki (2003), Shioi et al. (2013) and Kawai et al. (2018)). In this paper, a basic concept is examined regarding the real-time estimation of the vertical bearing capacity of RSP-piles. Then, a method is proposed for the sequential updating of reliability in the estimation of the side friction acting on a drilled pile at various depths. In this method, real-time information on the pile-head torque and the auger penetration depth per revolution obtained during piling is used in addition to N -values that are observed in a standard penetration test in advance.

Importance of this study in light of previous studies
In the Specifications for Highway Bridges Part IV (JRA 1996), a major design standard in Japan, the ultimate push-in bearing capacity is calculated by using the design model shown by Eq. (1) where, R u = the ultimate bearing capacity of the pile (kN); R tip u = the ultimate bearing capacity of the pile tip; R side u = the ultimate bearing capacity of the pile side friction; D r a f t  q d = the ultimate bearing capacity per unit area at the pile tip (kN/m 2 ) (hereinafter, tip resistance); A p = the area of the pile tip (m 2 ); U = the circumferential length of the pile (m); L i = the thickness of the layer for which side friction acting on the drilled pile is taken into account (m); and f i = the ultimate unit side friction acting on a drilled pile in the layer for which the side friction acting on the drilled pile is taken into account (kN/m 2 ) (hereinafter, side resistance). The tip resistance (q d ) and the side resistance (f i ) vary depending on piling method and soil type. In Fig. 1c, Modeling of the ultimate bearing capacity is shown. When a RSP-pile is utilized, q d is calculated by using Estimation Eq. (2): In this equation, α q d is a coefficient of tip resistance (Table 1). This coefficient is determined according to D w /D p (the two types of piles, D w /D p =1.5 and 2.0, are in the market as the factorial products), the ratio of the blade diameter (D w ) to the pile diameter (D p ), and also on the basis of the type of soil that the pile tip reaches. N is N -values of standard penetration test. Estimation equations [Eq.
(3) and Eq. (4)] give f i . f i is estimated by N -values and regression coefficient α fi .
Sandy soil : where, c is the soil cohesion(kN/m 2 ).Also, a regression coefficient α fi is 3 for sandy soil and 10 for cohesive soil. Okahara et al. (1991) collected data from many vertical pile loading tests that were conducted in Japan, and they quantified the modeling error of the design equation [Eq.(2), Eq.(3) and Eq.(4)] shown above. In their study, the ultimate bearing capacity used for analysis was defined as the load which was applied when the settlement depth was 10% of the pile diameter. They also conducted statistical analysis of the bearing capacity ratio (A/B). Here, A is a value of the ultimate bearing capacity observed in a vertical loading test. B is the ultimate bearing capacity calculated by using a bearing capacity estimation equation on the basis of N -values. The analysis results were first divided into those for cast-in-place concrete piles (CCP) and those for steel pipe piles (SSP), and the data were further divided into those for bearing piles and those for friction piles. Consequently, the data were stratified, and statistics are shown for each stratum (Okahara et al. 1991). According to their analysis, the coefficient of variation of the ultimate bearing capacity of the pile tip (COV R tip u ) and of the ultimate bearing capacity of the pile side friction (COV R side u ) are 0.36-0.46 and 0.53-0.63 respectively, the latter being slightly greater. Bias, which is defined as the mean value of A/B, is 1.01-1.21. This empirical method suggests that the design model slightly underestimates the ultimate bearing capacity and has large uncertainty.
Furthermore, empirical methods for checking the pile foundation stability that are similar to the methods described in 4-studies have been used in major design standards in Japan and elsewhere (AIJ 2000;OCDI 2009) as well as by AASHTO (AASHTO 1989(AASHTO , 2000(AASHTO , 2004. For the purpose of revising the AASHTO standards, Paikowsky (2004) developed a large database of vertical pile loading test results that were chiefly collected from the East Coast of the USA and also from abroad, and analyzed the modeling error of a pile design equation. Paikowsky (2004) collected and analyzed dynamic loading test results in addition to static loading test results, but the static loading test results alone are shown below. The ultimate bearing capacity is defined by using Davisson's criterion. In Paikowsky's analysis, modeling error correction is done by considering the soil type alone, without separating the side resistance and the tip resistance. Pile types and piling methods are not taken into consideration. Paikowsky reported that bias Ru was 1.12 and COV Ru was 0.59 for sandy soil, and that bias Ru was 0.83 -0.84 and COV Ru was 0.35 -0.39 for cohesive soil and alternating strata. These results indicate that the calculated values are slightly lower than the observed values for sandy soil and are higher than the observed values for cohesive soil. The coefficient of variation varies depending on the soil type. Furthermore, in recent years, the modeling error of the ultimate bearing capacity of piles in the vertical direction has been discussed, and it has been reported that the magnitude and the uncertainty of the ultimate bearing capacity varies depending on the soil type (Tang and Phoon (2018) and Tang and Phoon (2019)).
A piling technique that utilizes RSP-piles was developed around 2000; thus, the aforementioned studies did not deal with this relatively new piling technique. Fig. 2 shows the results of loading tests (PWRC 2009(PWRC , 2011JICE 2004) which were performed in the process of developing this piling technique. Values of f i and N -values observed in standard penetration tests at piling sites are shown for each soil type and D w /D p . Estimation Eqs. (3) and (4), which are included in the current design standards of Japan, for f i of RSP-piles are also presented in Fig. 2. Fig. 3 show the histgrams of α fi depending on the soil type. It is shown that the current design equation has large uncertainty on the relationship between f i and N -values. Regarding the statistics of A/B, bias fi = 1.63 and COV fi = 0.97 for sandy soil, and bias fi = 2.06  Figure 2: f i vs. SPT-N value determined by the specification's calculation equation and COV fi = 0.97 for cohesive soil. In estimating an f i value from an N -value for a RSP-pile as well as for a conventional pile foundation, simple estimation equations involve large uncertainty. As mentioned above, the empirical method of the ultimate bearing capacity of the pile in the vertical direction generally has large uncertainty. Although the magnitude of bias fi and COV fi are also big problem, it should be noted that the degree of the side friction varies depending on the soil type. At the piling sites, the soil layer composition of a certain point will be measured by some soil investigations such as standard penetration test etc., but the soil layer composition of the enforcement points of all piles will be not necessarily clarified. The risk caused by this information uncertainty has the potential to cause major problems such as significant inclination and settlement of building after the completion of construction.
This study aims at developing a method for the real-time confirmation of work quality at construction sites by utilizing information obtained during piling against this large uncertainty. Specifically, a method is proposed for the sequential updating of f i taking into account the characteristics of the piling site.
2 Data used in this study

Development and outline of a database
The database used in this study consists of the following: datasets contained in written reports on the review and certification of this piling technique (PWRC 2009(PWRC , 2011JICE 2004) (vertical loading test results available); and piling data collected at many piling sites in Japan (vertical loading test results unavailable). The database includes vertical loading test results, construction information (piling data), and soil investigation data (i.e., soil types and standard penetration test results). Because this study addresses real-time data updating for the side friction acting on a drilled pile, the vertical loading test results in the database are those of f i alone. Table 2 shows the basic specifications of the pile used for analysis and the indices   Table 6 in Appendix-A. As shown in Table 6 in Appendix-A, the piling sites to be able to measure all indices were only three (#1 -#3). This means that the number of datasets that contain observation data of all indices is limited and that the majority of the datasets are incomplete, lacking observation data for some indices. In particular, the amount of data obtained in loading tests regarding f i is much smaller than the amount of data on other indices. Construction information (piling data) and soil investigation data (i.e., T , S, and N ) are always collected in the piling process and thus are included in many datasets. Additionally, in loading tests, the interval between strain gauges varies depending on the test; thus, the amount of data available per test differs from index to index. The database used in this study is incomplete with regard to some indices, which poses a challenge that needs to be addressed in conducting the analysis.
In addition, f i is evaluated with reference to Okahara et al. (1991). Specifically, the ultimate bearing capacity is defined as the vertical load that is observed when the settlement depth is 10% of the blade diameter of a RSP-pile in this study.  Piling data such as T (pile-head torque) and S (auger penetration depth) are regarded as continuous data because the interval between observations is short, whereas N and f i are discrete data that are taken at a longer interval. Because the observation depth is not consistent for continuous data nor for discrete data, continuous data and discrete data need to be correlated in terms of the observation depth in order for us to correctly understand the relationships between indices.
For this purpose, continuous data taken from a certain range of depths are averaged so that each index is understood according to the observation depth of discrete data. Fig. 4 shows an example of observation data. Black lines show continuous data of each index, blue dots and red triangles respectively indicate discrete data in accordance with the observation depths of N and f i . In a standard penetration test, N -values are observed every 1m. Thus, as shown in Fig. 4 the mean value of the continuous data distributed in a range of ±0.5m of the observation depth of an N -value is used as the value corresponding to the observation depth of that N -value. Regarding the value of f i , the mean value of observations taken in the range of depth between two strain gauges is used as the value corresponding to the observation depth of f i . The depths of strain gauges are shown by solid gray lines.

Definition of the normalized pile-head torque T ′
On the assumption that T is mainly generated by friction between the pile tip and the bottom surface of the borehole, T is normalized by using the cube of the blade radius as follows.
The blade radius is r w (m), the facial pressure on the soil-pile interface of the pile tip isp tip (kN/m 2 ), and the friction coefficient of the soil-pile interface is µ g (Fig. 1a D r a f t and Fig. 1b). The differential torque (dT ) acting on the derivative area of the pile tip (ds) is expressed by Eq. (5).
Where, the differential area of the pile tip (ds) and the differential friction (df ) acting on that area. Thus, the torque (T ) applied to the pile-head is given by Eq. (6).
Then, r w , which is the pile structural parameter, is transferred to the left side as below.
In this equation, C is a proportionality constant, T ′ is the normalized torque by using the cube the blade radius. Incidentally, f i is given by Eq. (8).
Here,p side is the facial pressure on the soil-pile interface of the pile side. From Eq. (7) and Eq. (8), the relationship between T ′ and f i is estimated as a linear relationship as below.
Here, on the simple assumption thatp tip /p side is constant according to the location (depth) like the earth pressure coefficient, the relationship between f i and T ′ is considered to be in a linear relationship. Therefore, in this study, this normalized torque T ′ is used as the index for deriving the estimation equation of f i (α fi ).

Criteria for the data screening
Screening is applied to the database on the conditions that (1) T and S are non-negative, and (2) N is non-negative and less than 50. Regarding (1), some values fluctuate on a short cycle in the direction of depth, and some are negative. These values are regarded as observation errors and are excluded from the database. As for (2), it is probable that N -values observed as 50 were interpreted differently at various piling sites or by different measures. At one piling site, for example, all N -values observed as 50 or greater were recorded as 50. Thus, N -values recorded as 50 or greater are excluded, as these are not reliable data.
3 Study method 3.1 Flow of the study Ching and Phoon (2012, 2014 effectively solved various estimation problems by using multivariate normal distributions for the purpose of modeling multidimensional soil data. In Ching and Phoon (2012), by using Bayes'theorem in a skillful manner, a multivariate estimation equation for undrained shear strength of clay was successfully derived from various empirical equations and databases. This example D r a f t is similar to the work in this study, in that an incomplete dataset is used to derive a rational estimation equation. In Phoon (2013, 2014), diverse information on physical and strength tests of clay was used to create a database, and the validity of the authors'modeling approach was suggested. Soil data do not necessarily follow a normal distribution. Phoon (2013, 2014) proved that by converting soil data distributions into multidimensional normal distributions, linear relationships between indices are obtained and that multidimensional data having complex non-linear relationships can be interpolated and estimated in a simple and rational manner. Based on these previous studies, this study utilizes the aforementioned incomplete database in order to create a method for estimating f i . The flow of this study is shown in Fig. 5. After a database is created, two methods for estimating f i are proposed.

Method for creating an estimation equation by utilizing the conditional normal distribution
As mentioned above, observations for all indices were taken at only three of the piling sites (#1 -#3.) The least squares method (LSM) in regression analysis that can be used for deriving an equation for estimating f i require that data be available for all indices that correspond to the variables in the equation. Thus, only the data collected at the three piling sites can be used for deriving an estimation equation based on LSM.
In view of the above, for the purpose of making the best possible use of the collected data and of proposing a rational and valid equation, a conditional multivariate normal distribution (Bishop 2006) was utilized in order to derive an estimation equation for f i . The relationship between the conditional multivariate normal distribution and the linear model is explained as follows.
First, x is defined by the D-dimensional random variable vector [Eq. (10)].
When x follows normal distribution, its probability density function is expressed by D r a f t Eq. (11) (i.e., a D-dimensional multivariate normal distribution).
where, µ and Σ are parameters of the D-dimensional multivariate normal distribution. µ is a D-dimensional mean vector [Eq. (12)].
Σ is the covariance matrix for a D × D symmetric matrix in Eq. (13).
The covariance matrix (Σ) needs to be a positive definite matrix.
Next, x is divided into the two subsets x 1 and x 2 , as expressed in Eq. (14).
x 1 is an M -dimensional vector consisting of the elements from the first to the M -th elements (unknown parameters; herein objective variables), and x 2 is a (D − M )dimensional vector consisting of the remaining elements (known parameters; herein explanatory variables). Based on these subsets, the mean vector (µ) and the covariance matrix (Σ) are respectively defined by Eqs. (15) and (16). Because Σ is a symmetric matrix, Σ 11 and Σ 22 are also symmetric matrices, and Σ 21 = Σ T 12 . Under the condition that x 2 , a subset of random variables, is given, f (x 1 |x 2 ), the conditional distribution of x 1 , is considered to be as follows.
Eq. (18) is an example of the linear model defined by Eq. (20).
D r a f t By using the regression coefficients β 1 and β 2 in the linear model, Eq. (18) is expressed in the form of Eqs. (21) and (22).
Thus, the creation of an estimation equation having f i as an objective variable is possible when the mean vector and the covariance matrix, the parameters of the multivariate normal distribution, can be estimated. In order to apply this idea for the creation of an estimation equation, because x does not necessarily follow a normal distribution, the distribution of x must be converted to a normal distribution based on Box-Cox transformation (Box and Cox 1964) or Johnson transformation (Johnson 1949).

Approximation of an estimated covariance matrix by a positive definite matrix
In order for a covariance matrix to be a parameter of multivariate normal distribution, the covariance matrix must be a positive definite matrix. However, an estimated covariance matrix is not always a positive definite matrix when the covariance matrix is calculated by using an incomplete dataset, or when each element of the covariance matrix is independently estimated for estimating the parameters, as in this study. Thus, techniques for approximating an estimated covariance matrix by a positive definite matrix are explained below. First, assume that Σ est is the estimated D × D symmetric covariance matrix. Through eigenvalue decomposition, Σ est can be expressed by Eq. (23).
Λ is a diagonal matrix (Eq. (24)) in which the eigenvalues of Σ est are diagonal components when these eigenvalues are {λ 1 , · · · , λ D } in descending order of the values.
By using Λ ′ and U , a positive definite matrix Σ ′ est is defined by Eq. (27).
Because all eigenvalues are positive real numbers, Σ ′ est is a positive definite matrix. When the estimated variance-covariance matrix Σ est is not a positive definite matrix, the positive definite matrix Σ ′ est is used as an approximate matrix of Σ est for developing an estimation equation.
With the increase in the value of γ that replaces a negative eigenvalue, the difference between the estimated Σ est and the approximated Σ ′ est increases. Thus, the value of γ is determined through trial and error such that the value of γ will not significantly affect the estimation accuracy of the estimation equation. In this paper, the value of γ is 90% of the smallest eigenvalue among positive eigenvalues (Eq. (28)).

Methods for estimating a mean vector and a variance-covariance matrix
The estimation equation developed in this study has an objective variable (f i ) and explanatory variables N , T ′ and S. The mean vector and the covariance matrix of the multivariate normal distribution for estimation are expressed by Eqs. (29) and (30).
Two methods for estimating the mean vector and the covariance matrix are used, as explained below.

Estimation Method 1
First, the database mentioned above is used for calculating the mean value, the standard deviation value, and the correlation coefficient of each index. On the basis of these values, the mean vector and the covariance matrix are created. Because the amount of data varies depending on the index, mean values and standard deviation values are calculated by using the data available for each index, and correlation coefficients are computed by pairwise elimination.
D r a f t Figure 6: The calculation for correlation coefficient by pairwise elimination Fig. 6 outlines the calculation of correlation coefficients by pairwise elimination. In this method, as shown in Fig. 6, when there are missing data in each dataset of the variables x 1 , x 2 , and x 3 , correlation coefficients are calculated by using simultaneously available observed data. For example, ρ x1x2 , the correlation coefficient for x 1 and x 2 , is calculated by using the observed values in the dotted-line region of Fig. 6a, while ρ x2x3 , the correlation coefficient for x 2 and x 3 , is calculated by using the observed values in the dotted-line region in Fig. 6b. Then, the mean value, the standard deviation value, and the correlation coefficient that are calculated for each index are used to create a mean vector and a covariance matrix.

Estimation Method 2
First, a prior distribution is created for the parameters of each index. The prior distribution is updated to a posterior distribution by utilizing Bayes'theorem with a likelihood determined on the basis of observed data. The mean value of the posterior distribution is a representative value which is used as an estimated value of a parameter.
The basic idea behind using Bayes'theorem to update an estimation equation is as follows. A parameter vector θ is a random variable, and the probability distribution of θ is π(θ). The probability distribution π(θ) is a prior distribution in Bayesian updating. The prior distribution π(θ) is updated to a posterior distribution π(θ|D) on the basis of a likelihood f (D|θ) that corresponds to observed data D. The posterior distribution π(θ|D) of the parameter θ is expressed by Eq. (31).
The data D used for updating are {N, T ′ and S} which are observed at piling sites. Because these three indices are assumed to follow normal distribution, the likelihood f (D|θ) is calculated by a trivariate normally distribution. The parameters for updating are the mean value (µ), the standard deviation value (σ), and the correlation coefficient (ρ) for each of {N, T ′ and S}; thus, the parameter vector (θ) is a nine-dimensional D r a f t The parameters relating to f i , or µ fi , σ fi , ρ fiT ′ , ρ fiT ′ , ρ fiS , are calculated by using Estimation Method 1 above. It should be noted that Estimation Method 2 is interpreted as an application of empirical Bayes using the posterior mean of the estimated parameters.

Method for generating a prior distribution
From an engineering viewpoint, prior distributions are generated by the database for the following nine parameters which are elements of θ: In generating a prior distribution, the bootstrap method (Efron and Tibshirani 1993) is used. In this method, a sample distribution of parameters is created by subjecting sample data to "random sampling with replacement". In the example shown in Fig 7, bootstrap samples (1) to (B) {V 1 * 1 , V 1 * 2 , · · · , V 1 * B } are created by applying random sampling with replacement B times, with possible duplication, to the sample data (V 1 ) of Site 1.
θ is one of the elements of the parameter vector θ. A sample parameterθ 1 of Site1 is calculated for each bootstrap sample, and estimated parameters {θ 1 (V 1 * 1 ),θ 1 (V 1 * 2 ), · · · ,θ 1 (V 1 * B )} are obtained. This set of estimated parameters is a sample distribution of the estimated parameters for Site 1. The same procedure is followed for each piling site in order to create a sample distribution of the estimated parametersθ for each site. Then, all sample distributions ofθ obtained at each site are synthesized into one sample distribution. The operation iterated for each element of θ to create nine synthetic sample distributions. Based on the variable range of each parameter, the parameter's synthesized sample distribution are modeled by the probability density function shown in Table 3 by using a maximum likelihood method.
D r a f t The sample distribution is formulated and is used as a prior distribution (π(θ)) of the parameter θ.
In here, the most characteristic point is that the beta distribution (Eq. (33)) is adopted as the probability density function of the prior distribution for the correlation coefficient, as shown in Table 3. Furthermore, as shown below, the variable domain is expanded and applied in order to correspond to the variable domain for the correlation coefficient (Four-parameter beta distribution).
In Eq. (33), p and q are parameters of a beta distribution, p > 0 and q > 0. The condition of a variable range is 0 < x < 1. B (p, q) is a normalization constant that is defined by Eq. (34). This constant is used for obtaining a value of 1 by integration.
In Four-parameter beta distribution, the variable range condition of a beta distribution expressed by Eq. (33) is extended from 0 < x < 1 to α 1 < x < α 2 . The probability density function of the four-parameter beta distribution is expressed by Eq. (35).
Usually, the variable range of a correlation coefficient is −1 ≤ ρ XY ≤ 1. But in this study, it is assumed that indices do not perfectly correlate. Therefore, α 1 = −1 and α 2 = 1 are given to Eq. (35), and the probability density function of the four-parameter beta distribution , in which 0 < x < 1 is extended to α 1 < x < α 2 , is used as the prior distribution of a correlation coefficient.

Method for updating a prior parameter distribution based on observation data
A prior parameter distribution is updated to a posterior parameter distribution by means of particle filtering (Ristic et al. 2004). Particle filtering is a Sequential Monte Carlo method (Doucet et al. 2001). In this method, a posterior distribution is estimated on the basis of observation data by using Bayes'theorem. As mentioned above, the observation data are the observed values of the three indices {N, T ′ , and S}.
These observation data are defined as a data matrix D in Eq. (31). Posterior distribution π(θ|D) is estimated by resampling from the prior distribution π(θ), with the D r a f t likelihood f (D|θ) obtained the data matrix D as the weight. The prior distribution π(θ) is updated at an interval of 5m depth in order to obtain a posterior distribution π(θ|D).
The mean value of the posterior distribution is used as a new parameter of the multivariate normal distribution, and a sequential estimation equation is updated by using Eq. (18) for estimating f i . The updating interval of 5m depth was chosen by trial and error in order to obtain a desirable estimate.
Points to note are explained below regarding the representative values of estimates and the methods for updating posterior distributions. Strictly speaking, the updating of a prior distribution π(θ) needs to be done by using the likelihood f (D|θ) based on a trivariate normal distribution. However, this method results in a large number of parameters whose distributions need to be updated at one time, which makes it difficult to estimate posterior distributions by particle filtering. Therefore, we simplify this issue by updating the bivariate normal distribution for each combination of two parameters (i.e., N -S, N -T ′ and S -T ′ ). This simplified method has the problem on the updating operation regarding that two different posterior distributions are obtained for the mean values {µ N , µ T ′ , µ S } and standard deviations {σ N , σ T ′ , σ S }, which are subelements of parameter θ. The mean values of the representative values (i.e., the mean values and standard deviations) of these two distributions are used for creating the estimation equation for each depth.
The updating method is based on the simple assumptions mentioned above. Additionally, particle filtering requires large periods of time for processing reliability updating and is not suited to the real-time estimation of reliability. Because this study is a basic study on the validity of reliability updating in which real-time piling data are used, improvements in the strictness of updating techniques and in the calculation cost will be addressed in future studies. Fig. 8 shows pair scatterplots, histograms, correlation coefficient and normal Q-Q plots after data for each index are processed. Data processing means normalization of pilehead torque T and screening. In these pair scatterplots, the mean value of each index is shown in red.

Data processing results
The histograms show that each index is distributed asymmetrically and extends toward the right.In the Q-Q plots, none of the indices follows a normal distribution. To acquire the data followed normal distribution, logarithmic conversion is applied to each index because this is a simple technique for transforming variables as well as for interpreting transformed variables. Fig. 9 shows pair scatterplots, histograms, correlation coefficient and normal Q-Q plots for each index that was subjected to logarithmic conversion. It is shown in Fig. 9 that the distributions of indices after logarithmic conversion are mostly similar to normal distributions. The pair scatterplots show that a joint distribution of two indices has the mean value in the center of the distribution, which suggests that the joint distribution is also similar in shape to a normal distribution after the indices are subjected to logarithmic conversion. Although there are various conversion methods to normal distribution, we adopt logarithmic conversion in consideration with the small number of data. It is one of the reasons that the equation can be transformed to a simple equation that can be easily understood by the engineers, Where, β 0 , β T ′ and β S are regression coefficients. Note that, considering the subsequent exponential transformation [Eq. (39)], we denote the constant part as ln(β 0 ). To create the estimation equation, a mean vector [Eq. (40)] and a covariance matrix [Eq. (41)], which are parameters of each index to which logarithmic conversion was applied, are estimated.
Eq. (36) becomes Eq. (39) when the former is converted to a linear axis. The simplicity of Eq. (39) is one of the reasons for using logarithmic conversion to acquire normal distributions of indices.
D r a f t

Estimation results obtained by Estimation Method 2
In estimation method 2, x = (ln f i , ln N, ln T ′ , ln S) T obtained based on logarithmic transformation to each index is also used as a explanatory variable following the multivariate normal distribution. Thus, the parameter vector θ for creating the prior distribution π(θ) is Eq. (45). Fig. 11 shows the histgram of each index generated by means of the bootstrap method by using the data collected at piling sites. The data are contained in the database that is used for modeling. Parameters of prior distributions are identified by using a maximum likelihood method.Identified parameters of prior distributions are shown in Table 4. In Fig. 11, generated prior distributions are also superimposed (black line). Table 5 shows a regression coefficient {β 0 , β N , β ′ T , β S }, a estimation error of an estimate on logarithmic axisσ ln fi , and a coefficient of variance COV fi for each depth of estimation in the f i estimation equation created by using Estimation Method 2. Because the estimation equation is updated for every 5m of depth, the results are shown for each 5-m section.     4,5063!%& It should be noted that the proposed method does not necessarily reduce the uncertainty of the estimation equationσ ln fi . Unlike in a general direct estimation for coefficients of an estimation equation, the reason is because the proposed method is to be the estimation method for parameters of the multi-dimensional normal distribution taking into account the characteristics for each depth. In addition, in the proposed method at present, the estimation error in these parameter themselves estimations are not take into account in calculating ofσ ln fi . Fig. 12 shows the estimates of f i that are estimated by using the estimation equation created by means of Estimation Method 2. Red dots indicate observed values of f i that were obtained in vertical loading tests. Green lines show the results estimated by using prior distributions of parameters. Blue lines show estimates f i obtained by updating the estimation equation through Bayesian updating. Light blue regions indicate estimation error ranges for Estimation Method 2. Here, the estimation error ranges in Fig. 12 show the ranges simply conducted exponential transformation for ln f i ±σ ln fi of an estimate on the logarithmic axis. For every piling site, the results of Estimation Method 2 better close to the values observed in vertical loading tests than do the results of Estimation Method 1. Therefore, by using Estimation Method2, which is a method that is based on reliability updating that utilizes real-time piling data, bearing capacity can be estimated in real-time at the same time an RPS-pile is being placed. Furthermore, data for computing reliability analysis can also be acquired.
D r a f t

Conclusions and future issues
The study described in this paper aims at developing a method for real-time updating and verification of the bearing capacity reliability of RSP-piles. For that purpose, piling data observed onsite were effectively utilized. First, vertical loading test data of RSP-piles, soil investigation data, and piling data were collected for the purpose of creating a database consisting of observed values of side resistance f i , N -values obtained in standard penetration tests, and piling data (i.e., pile-head torque T and auger penetration depth per revolution S).
The database was incomplete because the available vertical loading test data were limited and a majority of datasets did not include observation data for some indices. Thus, the utilization of the incomplete database was a central topic of the study before an equation was derived for estimating the ultimate bearing capacity by effectively using piling data. Regarding this topic, a conditional multivariate normal distribution was used for testing the following two estimation methods, and the estimation results were examined.
(1) Estimation Method 1: A pairwise elimination method was applied to sample data in order to set a mean vector and a covariance matrix. Then an estimation equation for estimating f i was derived on the basis of N , T ′ , and S by using conditional multivariate normal distribution.
(2) Estimation Method 2: Prior distributions of parameter vectors were estimated. A prior distribution was created by using the bootstrap method for each site. In this method, a reliability estimate of ultimate bearing capacity is updated in real time on the basis of Bayes'theorem by using piling data that are observed at depth intervals of 5m.
This suggests that real-time updating of the reliability of bearing capacity estimates onsite is possible. The practical application of the estimation method verified above will be investigated in the future. For that purpose, it is necessary to create more accurate estimation equations and prior distributions, and the database needs to be expanded towards ensuring the validity of the analyses of the estimation results. Studies will be also conducted from the viewpoint of geotechnical engineering. Specifically, statistical methods such as principal component analysis will be utilized for analyzing correlations.

A Data used for the analyses
For the purpose of indicating the reliability of the statistical analyses described in this paper, the dimensions of the rotary steel pipe pile used for analyses are listed in Tab. 6. Regarding the datasets that include piling data, a depth distribution of each index is shown in Fig. 13. B Changes in simultaneous probability density resulting from updating by using piling data (Estimation Method 2) The diagrams in Fig. 14 show changes in the probability density of each variable. In Estimation Method 2, these changes are caused by updates to the parameter distribu-D r a f t tions by means of Bayes'theorem, with the likelihoods determined on the basis of piling data. Due to space constraints, changes in the probability density are shown only for Site 1. The estimation equation is updated by using a conditional distribution. For this purpose, the probability density (shown in blue) according to the estimated posterior distribution of each parameter is used.
D r a f t D r a f t