Planning low-error SHM strategy by constrained observability method

Structural identification using dynamical parameters (such as the natural vibration frequencies and mode shapes) is an important issue, especially in bridges or high-rise buildings. However, incorrect decisions could happen on the Structural Health Monitoring (SHM) strategy and the Structural System Identification (SSI) analysis that makes the sometimes expensive and time-consuming process useless due to the large uncertainty of the resulting estimations. This paper discusses the role of the SHM strategy and the SSI analysis based on the constrained observability method (COM) and decision trees (DT) in reducing the estimation error. Here, the COM uses subsets of natural frequencies and/or modal-shapes to deal with the nonlinearity of the SSI derived from the operational aspects of the methods, and combines the unknown items including frequencies and mode shapes into an optimization process. Next, a decision-support tool based on decision trees is applied to help engineers to establish the best SHM + SSI strategy yielding the most accurate estimations. The principle and steps of this new method, the combination of constrained observability m,ethod and decision trees, are presented for the first time. After that, a numerical model of a bridge case is used to show how to choose the optimal strategy, when factors such as the structure layout, span length, measurement set, and parameters of the COM are included as decision variables. The importance ranking of these four factors is the layout, measurement set, parameters of the COM, and length through the sensitivity analysis of the COM estimated. Last, a real bridge is used to validate this methodology under the undamaged and damaged scenarios by comparing an Error Index, which shows the optimal SHM + SSI strategy works well no matter the bridge is damaged or not. The presented analysis leads to significant insights that can help the decision-making of the optimal SHM + SSI strategy, avoiding erroneous decisions if this tool is not used beforehand.


Introduction
Structural system identification (SSI) is the process of modeling/ updating an unknown or not perfectly known system [1] and thus identifying its unknown structural parameters based on the analysis of the structure's performance when excited statically or dynamically [2]. SSI may be categorized into two groups based on the type of excitation: dynamic and static, depending on whether the inertial effects are involved or not. Examples of specific techniques applied to each of them can be found in [3][4][5].The unknown characteristics of a structure include Young's modulus, area, inertia, mass, stiffness and damping properties. Since SSI can estimate the structural members' properties, damage identification can be realized through comparing changes between the expected (non-damaged) and observed mechanical properties based on the structure response [6] as well as Structural Health Monitoring (SHM) [7,8]. For this reason, SHM and SSI have attracted research's massive interest in recent years.
Lozano-Galant [9,10] proposed the observability method (OM) for SSI from static tests. This technique transforms the stiffness matrix into a monomial-ratio system of equations and enables the mathematical identification of the stiffness profile of the entire structure or a portion of it using a subset of deflection and/or rotation measurements. After that, numerical OM [11] and constrained observability method (COM) [12] were developed on static SSI. This mathematical approach has been used in many fields, such as hydraulics, electrical, and power networks or transportation [13,14]. Simultaneously, the OM [15] and COM [16] on dynamic SSI verified their applicability through several beam and frame cases. The dynamic COM addresses the nonlinearity of the SSI methods by using subsets of natural frequencies and/or modal shapes. Obtaining natural frequencies and mode shapes is limited to the case when no traffic is present on the bridge [17], and the effect of environmental changes (mainly temperature) in the recorded sensor data can be easily processed and removed by a Principal Component Analysis (PCA) [18] or similar, before the application of the method. The dynamic COM combines the unknown parameters including the theoretical frequencies and mode shapes into an optimization process that minimizes the objective function obtained as the squared sum of the frequencies-and mode shape-related errors.
The selection of the objective function to minimize in the optimization process has a profound impact on the problem output. It does not only affect the interpretation of the best correlation between the unknown parameters but also influences the performance of the selected optimization algorithm. Normally, eigenvalue residual and consideration of the modal assurance criterion MAC related function are used, as well as the residual vector of the deviation from the orthogonality of the experimental mode shapes to the theoretical ones [19]. Most of the sensitivity-based approaches reported for finite element updating of real case studies have considered only the eigenvalue or frequency residual [20,21]. Additionally, some papers are concentrated on the multiobjective identification method by dividing the frequencies or eigenvalue residual [22] and mode shape-related residual as two parts to estimate the structural parameters [23,24]. On the other hand, some researchers establish weighted multi-objective functions considering frequency residual, mode shape-related residual, and modal flexibility residual together [25], the majority of them giving equal weights to each residual [26]. Accordingly, the results of the optimization are affected by the values given to the weighting factors of the objective function. The dynamic SSI in this paper is using the optimization process of dynamic COM, thus defining an objective function of eigenvalue residual and MAC related residual. However, weighting factors of these two residuals are considered as different values, investigating their influence on the final estimated parameters.
A major goal of conducting SHM and the subsequent SSI is to derive conclusions about the real state of a given structure. Whereas the SHM focuses on monitoring the actual structural response, SSI aims at determining the actual mechanical properties of the structure based on the observed response. Both the monitoring strategy and SSI analysis play an essential role in the uncertainty level of the estimated features. This combined approach is not common in the literature. For instance, references [27][28][29] show how the sensors' accuracy, the optimal placement of sensors, and how they are combined highly influence the quality of the estimation. However, these analyses do not consider both the monitoring setting and the characteristics of the method used for SSI as design variables at the same time. Improving one of the sides, i.e., either the conditions of monitoring or the definition of the model used for SSI, does not guarantee the most accurate estimation, which can be obtained if both of them are combined. An adequate combination of the monitoring strategy and SSI analysis can yield a more accurate estimation of the structural parameters. Making the right decisions when designing a strategy that combines both SHM and SSI can result in significant time and cost savings, avoiding estimations that cannot be trusted due to their large uncertainty.
Thus, this paper presents a decision-based approach that helps: (1) To select an adequate combined SHM + SSI strategy that minimizes the uncertainty of the estimations; (2) To determine to which extent the decisions on the SHM process influence the final error in the estimation; (3) To assess the contribution of the SSI-COM in this final error.
This new application of COM and decision tree is of great interest to optimize resources and also covers a clear gap identified in the related literature.
The approach is based on decision tree analysis. Decision tree algorithms are one of the most common techniques of inductive learning, especially in the field of Machine Learning (ML) [30,31]. The decision tree algorithm can be used for solving regression and classification problems. For its powerful capability to combine numerical with categorical data, its application in the area of civil engineering is gaining relevance [32]. A fuzzy group decision making (FGDM) approach offered a flexible, practical, and effective way of modeling bridge risks [33]. A decision support system for bridge maintenance was developed by extensive literature review, interviews with bridge maintenance experts, and a national survey [34]. The decision tree algorithm is used to analyze the deterioration of the health index of a set of concrete bridge decks [35]. A decision tree learning algorithm is adopted to train the model of a full-scale long-span suspension bridge using six recent years' database [36]. However, the analysis of the decision tree algorithm on the most critical factors to reduce the error of the estimated parameters is still insufficient. Decision trees dealing with the selection of the optimal measurement sets or model parameters do not appear in the literature. In this paper, the new application of decision trees combined with SHM + SSI provides a new insight into the problem of structural identification and damage detection.
The paper is organized as follows. Section 2 introduces the main idea of COM on dynamic SSI and its main steps. The principal of decision tree algorithm and the proposed an optimal SHM + SSI strategy roadmap are also presented in Section 2. An initial descriptive study of the most relevant variables affecting the estimation accuracy of an SHM + SSI strategy is conducted in Section 3, which presents a theoretical context to explain the approach and to show the usefulness of this strategy. In Section 4, a real bridge is used to verify the proposed methodology based on the decision trees yielding to an optimal SHM + SSI strategy based on a real assessment project in the Netherlands (InfraWatch Project). The damaged and undamaged cases are analyzed in this section. Lastly, in Section 5, some conclusions are drawn.

Dynamic constrained observability method
The dynamic SSI by COM is explained in this section, highlighting the differences with the OM. Dynamic SSI by COM [16] is used by imposing constraints on variables when no more parameters can be observed using SSI by OM [15]. In this methodology, modal torsion effect and the error difference of first and second mode information are neglected for the sake of simplicity. The advantage of OM is its systematization and standardization, that is, it can be easily implemented using conventional programing software packages (e.g., Python or Matlab). Nonetheless, when dealing with large-scale structures, more sophisticated programing might be required to reduce the computational time.
In both methods, a finite element model (FEM) should be defined first. Then the dynamic equilibrium equation can be established as shown in Eq. (1) without both damping and external applied forces. For 2D structural models made out of beam elements load in their plane, the size of the stiffness matrix K, and mass matrix M, are (3N N − N B ) × (3N N − N B ), and the size of the vector of modal displacements ∅ i , is (3N N − N B ) × 1. N N and N B represent the total number of nodes and total boundary conditions, respectively. Besides, the sub-index i and R refers to the i th vibration mode of the total number of vibration modes R considered.
The global stiffness matrix K, includes geometrical and mechanical properties of element j, such as length L j , bending stiffness EI j , axial stiffness EA j . ∅ i includes the deformation in the x-direction (u ki ), y-direction (v ki ) and rotation (w ki ) at each node k for each vibration mode i. λ i is the squared frequency of the i th vibration mode. The objective of SSI is to identify structural parameters θ (i.e., bending stiffness EI j ,axial stiffness EA j ). Once the known and unknown structural parameters as well as the boundary conditions of the structure have been defined, the unknowns are clustered together as shown in Eq. (2). rx, sx, mx,and nx represent the number of columns of the corresponding stiffness coefficient sub-matrix and the number of columns of the corresponding mass coefficient sub-matrix. Ki,0 Mi,0 The known and unknown items are indicated by subscripts 1 and 0, respectively. The detailed parts of modified stiffness and mass matrices K i * and M i *, and modified modal shapes ∅ Ki * and ∅ Mi *, are shown in Eq.
(2). The next step is to rearrange the system such that all the unknowns of the system are in one column vector. By doing so, it is possible to obtain the system of equations in the form of Eq. (3) for the i th vibration mode. Eq. (4) illustrates the combination of multiple modal information (the first R modes).
To deal with Eq. (4), the OM is used first. Hence, the product of unknowns in z is taken as unrelated variables. This means that Bz = D is a system of linear equations and its general solution is the sum of a particular solution z p , and a homogeneous one z nh , which corresponds to the case Bz = 0. z p is the pseudo-inverse solution of Eq. (4). z nh can be expressed as the combination of a basis of the space and arbitrary real values [V]{τ}.
The value of [V] is critical for the result for Bz = D. If any row of null space is composed of only zeros, then the corresponding particular solution will represent the unique solution of that parameter, which be categorized as an observed parameter. In the next analysis iteration, if there are observed parameters, all these observable unknowns are introduced as known parameters to obtain updated equations and thus new parameters might be observed. In Appendix I, a simple example has been added to further explain Eqs. (1)- (5).
In real applications, the full observability is seldom achieved. Thus, COM arises to overcome the OM's limitations in the dynamic case. The main idea of COM is to identify the relationship between the unknowns in z that have been neglected by the OM. These unknowns are monomials of degree one {EA j , EI j , u ik , v ik , w ik } and monomials of degree two {EA j u ik , EI j v ik , EI j w ik }. The relationship between these types of monomials, i.e.,EA j u ik = EA j * u ik , EI j v ik = EI j * v ik , EI j w ik = EI j * w ik , cannot be introduced into the SSI by OM because it is a linear method, resulting in variables that may not be successfully detected in some cases. COM [12,16] uses an optimization method to overcome this non-linearity.
COM is applied after the last OM loop in which no more variables can be observed. An objective function is used to achieve the full observability considering the measurement error effect. Eq. (6) is used to minimize the squared sum of frequency-related error and mode shaperelated error. ∆λ i is the difference between the measured λ i , and the estimated circle frequencies, MAC i is the modal assurance criterion, which measures the closeness between the calculated mode shape ∅ mi , obtained from the inverse analysis using the estimated stiffnesses and the measured shape ∅ mi as shown in Eq. (7). W λ and W δ represent the weighting factors of the circle frequencies coefficient components and mode shape components, respectively. In most analyses, W λ and W δ are assumed to be equal [24]. However, the influence of the weighting factors is discussed in this paper. In the following analysis, the weighting factor for the frequency error part is the value of W λ , and the corresponding value for mode-shapes is equal to 1-W λ .
The imposed constraint relation between the unknowns in z should be considered in the process of minimizing Eq. (6), i.e., EA j u ik = EA j * u ik , Fig. 1 shows the COM steps, in which the OM is included and highlighted by the black box. COM is further developed based on the idea of OM, from step 10 to step 14.
The reader is referred to the work [16] if more detailed information about dynamic SSI by COM is required. The displacements and rotations mentioned in this COM method refer to the mode shape displacements and rotations. This means that displacements and rotations are not directly measured, but obtained from the mode shape.
In order to compare the evaluation effect of all parameters, an error index ρ, associated with the estimation by Eq. (6) is proposed. The error index is calculated as the mean squared error of the n estimated parameters θ , that is,

Decision tree learning algorithm
A decision tree learning algorithm is employed in this study to establish a regression model to assess the effect of input factors (such as mentioned boundary layout, measurement set, weighting factor and span-length), on the error-index ρ of the estimation. The algorithm starts from a root node, and then many child nodes gradually grow, forming a tree structure. The merits of decision trees are that they are computationally cheap to use, the learned results are easy to understand, the results can be obtained even if some values are missing, and they can deal with irrelevant features [20,37].
To build a decision tree successfully, the decision about which factor is used to split the data should be made based on an established splitting criterion. To make sure which factors are adequate to do it, every factor needs to be considered and its effect on the splitting results measured. Then, the best factor is chosen. A binary decision tree is proposed in this paper, thus, at each node, the data is split into two subsets. If the data of the subset on the branches are of the same class, there is no need of continuing to split the data, stopping the branch at this point. Otherwise, the splitting process on this subset should continue. Some stopping criteria can be imposed to stop the splitting process, such as a minimum number of data points belonging to a subset and a maximum depth of the tree.
This section uses the CART (classification and regression tree) algorithm [38], which is an effective method of decision tree learning algorithm. The CART algorithm builds binary trees and can handle discrete as well as continuous split values. Given the response variable is the error-index ρ, regression trees are used, and the splitting criterion, the variance reduction. The variance reduction of a node is defined as the total reduction of the variance of the response variable, which is calculated as follows: where S, S t and S f are the set of sample indices before splitting, set of sample indices for which the split test is true, and set of sample indices for which the split test is false, respectively. Note that the concept of variance underlies in each summand of Eq. (9).

Method of optimal SHM + SSI
In a general case, the application of the DT in combination with the COM method to plan an optimal SHM + SSI is summarized as follows; (1) the undamaged structure is considered, assuming the designed layout and mechanical properties. Its dynamic behavior is obtained (direct analysis), that is, the deformation in the x-and y-direction and rotation at each node for each considered vibration mode and their corresponding frequencies; (2) different combinations of measure devices (i.e., number, type, and location) are defined along with the accuracy of the devices. The devices should be located aiming at determining the unknown (target) parameters; (3) measure records given by the SHM are simulated by considering, for each combination of measurement devices, the theoretical (undamaged) values of deformation and rotation distorted by a random error consistent with the corresponding device's lack of accuracy. In this way, the dynamic behavior of an undamaged structure recorded by inaccurate devices is simulated. The number of simulations related to each combination of measure devices should be large enough to capture the stochastic nature of the process; (4) the observability-based SSI using the COM is conducted (inverse analysis) to obtain the unknown (target) parameters for each simulated measure record. Different values of the weighting factor W λ are used to conduct this analysis; (5) the error-index ρ is obtained by comparing the values of the target parameters obtained through the direct and inverse analyses; (6) the decision tree is built using the measurement sets and values of the weighting factors as explanatory variables, and the error-index as the response variable; (7) The information provided by the decision tree will support the decision on the best combination of measurement devices and the weighting factor.
For the sake of illustration, Fig. 2 shows a roadmap of the steps to follow.

Theoretical Framework for an optimal SHM þ SSI strategy
This section provides a theoretical framework to clarify the method and illustrate the utility of this technique regarding structural behavior by performing an initial descriptive analysis of the most important variables influencing the estimation accuracy of a SHM + SSI strategy.

Introduction of case study
In this section, an academic analysis about different factors (layout, span length, measurement sets, and weighting factors) is introduced in detail under the COM framework with the objective of (i) achieving a better understanding of the influence of these factors on the output uncertainty and (ii) showing the need for more sophisticated tools able to capture the joint effect of these factors. Four bridge layouts are assumed according to different boundary conditions: 1|Pinned-pinned, 2|Pinned-clamped, 3|Clamped-pinned, 4|Clamped-clamped, which are shown in Fig. 3. The FEMs of the four bridge types are defined by 7 nodes and 6 beam elements. Three types of sections are considered: ①, ②, ③. The bending stiffness EI 2 and EI 3 indicated in Fig. 3 are assumed to be unknown. For these layouts, the mass information m 1 , m 2 , m 3 , the length of each element L/6, and the bending stiffness of Section 1, EI 1 , are assumed as known. Considering that the horizontal displacement of the bridge is small, the influence of the horizontal direction can be ignored. The first two vibration modes are used in this study. The sample points are studied for three scenarios that differ in the measurement sets considered, which are shown in Fig. 4. These three measurement sets include the vertical and rotational modal displacement at nodes (4, 5, 7), (4-7) and (1-7), respectively. The nodes are given in Fig. 3.
A variable span length is also considered, that is, 50 m, 55 m, 60 m, and five cases of weighting factors W λ , that is, 0.5, 0.6, 0.7, 0.8, 0.9. The collection of all scenarios is illustrated in Table 1. The reason for choosing W λ ≥ 0.5 is that the frequencies are more sensitive to small changes of stiffnesses compared to mode-shape [39]. Besides, for each scenario combining layout, measurement set, span length, and weighting factor, frequencies, vertical displacement, and rotational modal coordinates are introduced with a given error level. The error is assumed to    follow a uniform distribution between 1 ~ 3%, 2% ~ 6% and 10% ~ 30%, respectively. The frequency error range was chosen checking the frequency accuracy of several dynamic tests [40][41][42] where different analytical methods were used for identification. The vertical displacement error range were chosen from reference [40], who identifies the vertical displacement with accuracies of about 2% ~ 6%. As accuracy of rotation is lower than the accuracy of vertical displacement [24], a range of 10 ~ 30% was chosen for that purpose. In experiments or field measurements under free/ambient vibration, modal analysis was originally used for Experimental Modal Analysis (EMA).While for the drawback of EMA which needs the input forces, some Operational Modal Analysis (OMA) method were developed [43], including Peak-Picking method, the Auto Regressive-Moving Average Vector model, the Natural Excitation Technique, the Random Decrement Technique, the Frequency Domain Decomposition and the Stochastic Subspace Identification. The Frequency Domain Decomposition (translational frequency response function [44] and rotational frequency response function [45]) is verified to yield the value of vertical and rotational mode displacements with acceptable accuracy. A total of 1000 samples are used to analyze each measurement set. In one sample there is one model response of frequencies and mode-shape displacement. For both, clamped and pinned supports, their vertical displacements are processed as 0. For the clamped supports, the rotations of corresponding nodes are processed as 0. The total combination of influence factors are demonstrated in Table 1.
The results can be validated through the comparison of the estimations given by the SSI against the response of the real structure or a structural model that is error-free.

Analysis of the combined factors
The error index ρ, calculated by Eq.

̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ (Ẽ
where Ẽ I 2 andẼI 3 are the estimated stiffnesses normalized by the actual value, i.e., a value of 1 denotes perfect estimation of the parameter. The larger the value of ρ is, the lower the overall accuracy of the estimated parameters.
For each of the 180 scenarios, an average value of the stiffnesses is obtained from the 1000 samples assuming different random errors. The results are depicted in Figs. 5, 6, and 7. From these figures, it can be seen that the span length has a minor contribution to the error-index ρ, whereas the bridge layout presents the larger influence. It is also noted that the results obtained for the pinned-clamped and clamped-pinned layouts are sensibly similar. The influence of the weighting factor W λ , varies with the measurement set, that is, its influence is very small for Set 1, especially for the layouts of pinned-clamped and clamped-pinned; for Set 2, the weighting factor displays the largest influence, which is exhibited in the case of pinned-pinned support conditions; and for Set 3, the larger influence of the weighting factor occurs with the pinnedclamped and clamped-pinned layouts.
The worst results are given in the case of clamped-clamped support conditions and measurement Set 1, with values of the error-index ρ close to 12%, whereas the best results (error index around 1.8%) correspond to the same support conditions, clamped-clamped, and measurement Set 3. In the worst case, the mean values of Ẽ I 2 and Ẽ I 3 are 1.01 and 0.97, with standard deviations 0.08 and 0.096, respectively. These values are acceptable for SSI. These two results highlight the complexity of designing an optimal SHM + SSI strategy, given the joint influence of the involved variables on the quality of the estimation.
To facilitate the understanding of these interactions, the following section presents the decision trees that will allow the organization of the scenarios according to the resulting error-indices ρ.

Identification of impact factor by decision tree learning algorithm
The CART is applied to deal with the 180 scenarios, obtaining the decision tree model shown in Fig. 8 whose node characteristics are indicated in Table 2. Therefore, the decision tree considers four explanatory variables: layout, span length, measurement set, and weighting factor W λ . Although the decision tree can be pruned to remove those branches that provide little classification power, the entire decision tree has been presented to allow the interpretation of the results and the explanation of the tree itself. The response variable (error-index ρ) of the 180 total cases has 3.97% of mean value and 2.59% of standard deviation (see Node 1).
From Fig. 8, it is clear that the structural layout is the first factor to draw the tree at the first level, which means the influence of the layout on the error-index ρ is essential compared to the other three explanatory variables. The layouts of 2|pinned-clamped and 3|clamped-pinned belong to the left branch, and 1|pinned-pinned and 4|clamped-clamped belong to the right branch. Besides, it seems that the error-index is smaller when the boundary conditions of the bridge are asymmetric (left branch). This may be because asymmetric conditions can decompose by symmetric and anti-symmetric conditions. The two conditions exhibit some offsetting behaviors in the parameter evaluation process. That is why the values of PP60 and CC60 in Fig. 5 are larger than the other two. After this, the next important factor is the selection of the measurement set as indicated by the second level of the tree. The values of measurement Set 1 are classified separately from the ones of measurement Sets 2 and 3. The selection of both pinned-clamped or clamped-pinned support conditions and measurement Set 1 yields a mean value of the error-index ρ of 1.93% with 0.12% of standard deviation (see Node 4 in Table 2), whereas selecting measurement Sets 2 and 3 for these types of support almost doubles the error, with 3.17% of mean (see Node 5 in Table 2).
For the cases of Layout 2|pinned-clamped and 3|clamped-pinned and measurement Set 1, either the weighting factor W λ , or the span length does not pose a big influence, as shown by the small value of the coefficient of variation at Node 4 (0.12/1.93 = 0.06). However, in other cases, such as the case of Nodes 11 (Layout 2|pinned-clamped or 3| clamped-pinned and measurement Set 3) and 14 (Layout 1|pinnedpinned and measurement Set 2), the adequate selection of the weighting factor (W λ ≥ 0.85 for the first and W λ ≥ 0.65 for the second) can almost double the accuracy of the results.
It is noted that the effect of the span length does not appear in the right side of the tree (Layouts 1|pinned-pinned and 4|clampedclamped), and only appears at Node 8 (cv = 0.08/1.88 = 0.04), meaning that the span length has a residual influence on the results, at least in the studied range (50 to 60 m). This is consistent with the outcome of Section 3.2.  From this decision tree, the best decisions are made by comparing the classifications. Some best choices can be drawn. Firstly, measurement Set 1 is the best choice for Layouts 2|pinned-clamped and 3|clampedpinned. In this case, the role that the weighting factor, W λ , has on the accuracy of the estimated Ẽ I 2 and Ẽ I 3 is negligible. Secondly, a weighting factor of 0.9 is the best choice for Layouts 2|pinned-clamped or 3|clamped-pinned and measurement Set 3. In this case, the weighting factor plays a relevant role in improving the accuracy of the estimation. Thirdly, the optimal measurement choices for the Layout 4|clampedclamped are Sets 2 and 3. Finally, the combination of the Layout 1| pinned-pinned or 4|clamped-clamped and measurement Set 1 should be avoided due to the resulting large error-index ρ, no matter the assumed value of the weighting factor W λ .
It is noted that these values correspond to the training set, so different values can be observed in real practice.

Discussion on the optimal SHM + SSI strategy
This part is to investigate the sensitivity of the outcomes to the effects encompassed by each scenario. Using the analysis result in Section 3.3, the influence of each factor is found by the control variable method. After removal of one of the four variables (Table 1), DT is used to analyze the remaining three ones. Table 3 demonstrates the corresponding optimal SHM + SSI strategy when considering the remaining three factors.
From the result of Table 3, the optimal SHM + SSI strategy corresponds to the choice in Section 3.3. For this particular structural analysis, it is clear that the span length of the bridge is not a relevant variable to consider in any optimal strategy, at least in the studied range for the reason that no strategy about the span length appears in Table 3. This is a sound result that validates the rationality of the method. Regarding the other studied variables, the layout, measurement set and error   frequency-related weighting factor, are the more complex cases when applying the observability method with COM. Although the structural layout is not a decision variable as it cannot be selected, it is the most important input when deciding the optimal SHM + SSI strategy which can be seen in the last three rows of Table 3. Besides, the second important decision is the selection of the measurement set. Thirdly, the selection of an adequate value of the weighting factor can significantly increase the quality of the estimation in some cases when ignoring the influence of layout and measurement. Whether low or high values of the weighting factor perform better will depend on the specific case. Decision trees can be used to derive this value. It is highlighted that these results are case-specific but the process of choosing a strategy can be followed in a similar way. The next section analyses to which extent the information provided by the decision trees based on the assumption of undamaged structure can be useful to support decisions on the optimal SHM + SSI strategy in a real case.

Introduction of case study
The Hollandse Brug (Fig. 9), located in the center part of the Netherlands, belongs to one of the main highway connections between Amsterdam and the Northeast of the Netherlands. The bridge is a prestressed concrete bridge composed of precast beams and an upper concrete slab poured in situ and was opened for traffic in 1969. The bridge has seven spans of 50.55 m. A dilatation joint was placed between each span, which causes that the bending moments cannot be transferred from one span to another. Thus, each span can be considered separately as a simply-supported structure. To extend its service life renovations and strengthening were conducted in 2008.
SHM data collection was conducted to understand the service-life assessment of this renovation bridge by the InfraWatch research project. The SHM system consists of sensors positioned on three crosssections of the first span (Fig. 10a).
Based on the vibration data gathered with accelerometers (geophones), located at various intervals along and across the bridge, detailed information about the mode shapes (Fig. 10b) and natural frequencies (f 1 = 2.51 Hz, f 2 = 10.09 Hz) could be obtained [46][47][48] by Peak-Picking method and Stochastic Subspace Identification. It is worth mentioning that the previous studies on this bridge focus only on the data collection, processing, and the comparison of the FEM model results instead of the identification of structural health. The next analysis fills this gap.

Decision tree for the Hollandse Brug
The goal is to define the best SHM + SSI strategy to assess the unknown parameters EI 2 and EI 3 according to Fig. 3. The layout of each span of the bridge can be assumed as pinned-pinned due to the dilatation joint. Through the model calibration, based on the two natural frequencies, the simplified model could be identified as corresponding to Fig. 3, Layout 1. The parameters of each element are shown in Table 4.
In this case, real data is used to build a model of the Hollandse Brug bridge. The structural response given by this model (mode shapes, vibration frequencies, etc.) is assumed as the real response of the bridge, and used as a baseline for the validation of the method.
The errors of the measured frequencies, vertical displacement, and rotation are assumed to follow uniform distributions bounded between 1% ~ 3%, 2% ~ 6%, and 10% ~ 30%, respectively. A total number of 1000 samples are analyzed by dynamic COM under 6 different measurement sets. The six sets are shown in Fig. 11.
An initial decision tree with no information on the level of damage of the structure (undamaged bridge with all cross-sections with properties as presented in Table 4) can be drawn according to Section 2.2, as shown in Fig. 12. From this decision tree, it is clear that the best decision is to select Set 4 (see Node 4 of the tree). The results of Set 2, 5, and 6 are significantly better than the ones of Set 1 and 3. This clear difference cannot be easily foreseen without the decision tree, showing that the obtained results are not trivial at all. When considering the effect of the weighting factor W λ , the performance of lower weighting factors (W λ < 0.75) is better than the case of higher weighting factors (W λ > 0.75), being a relevant aspect to consider to reduce the error-index ρ in most of the cases (compare Nodes 8 and 9). The detailed information of each node is shown in Table 5. In the last column of the table, the coefficient of variation, i.e., standard deviation normalized by the mean, exhibits a maximum value of 0.41 for an end node, which shows the robustness of the tree.

Validation for a damaged bridge
Once the theoretical decision tree is obtained for the undamaged bridge, two damage scenarios are analyzed in this section. The bridge mid-span is assumed to be damaged considering 5% and 30% of reduction of EI 2 , as shown in Fig. 13. The damage patterns have been assumed to create the scenarios needed to validate the approach. Nevertheless, this knowledge is not introduced as an input of the model. Therefore, knowing the damage patterns is not required for its The COM is used to obtain the estimated Ẽ I 2 and Ẽ I 3 , with the parameters in Table 4 and the two damage levels. The errors of frequencies and vertical displacements are the ones indicated in Section 4.2. To account for the measurement error, the average value of 1000 simulations has been considered. Finally, the parameters have been estimated for W λ = 0.5, 0.6, 0.7, 0.8 and 0.9. Table 6 summarizes the error-index comparison between the theoretical values given by the decision tree (undamaged structure) and the estimation of the damaged structure. According to the analysis in Section 4.2, the largest estimation error occurs with measurement Sets 1 and 3 (Nodes 6 and 7), whereas the best estimation is obtained with measurement Set 4 (Node 4). These two scenarios are used to validate the decision tree.
When comparing the values yielded by Nodes 6 and 7 to the obtained ones, the results are fully consistent for both damage levels. Note that the influence of W λ indicated by the decision tree remains in the damaged bridge, showing a larger standard deviation for W λ < 0.75 in the case of the damaged bridge.
Regarding the values given by Node 4, they are consistent with the results of 5% of damage, however, a larger mean of error-index ρ is found for the case of 30% of damage. This can be related to the small value of the error-index ρ obtained in the case of the undamaged structure (0.71%). Nonetheless, the values of the error-index ρ obtained for the measurement Set 4 are clearly better than the ones of Set 1 and 3. The small value of the standard deviation obtained for the damaged bridge denotes the low influence of the weighting factor, which is consistent with the left branch of the decision tree (the weighting factor is not included in this branch). Based on these results, for the Hollandse Brug, the optimal measurement set is Set 4, the second choice is Set 2, 5, and 6, the worst set is Set 1 and Set 3 no matter the bridge is undamaged or not. It could be seen that decision trees in combination with the COM method seem to be a useful tool to plan the best strategy of SHM + SSI, providing information that is not trivial and highly reliable.

Conclusions
This paper proposes a decision tool to help building the best combined strategy of SHM and SSI that can result in the most accurate estimations of the structural properties. To do so, the combination of COM method and decision tree (an optimal SHM + SSI strategy) is used for the first time.
The main concept of the optimal SHM + SSI strategy is given, as shown in roadmap (Fig. 2). Decision trees (DT) are firstly presented to investigate the influence of the variables involved in the SHM + SSI process on the error estimation in a general structure, including structural layout, measurement set, span length and weighting factor based on the estimated parameters from COM. Through the sensitivity analysis of the COM and DT, the ranking of the four variables are as follows: layout, measurement set, parameters of the COM (weighting factor) and span-length. The analysis of different variables provides a theoretical framework to clarify this method and illustrate the utility of this technique.
Later, the same concept is applied to a specific structure, the Hollande Brug. The decision tree is used as a tool to plan the optimal SHM + SSI strategy, with no initial knowledge of the actual structural state, and the robustness of the results is given for two levels of damage. For this specific bridge, the optimal measurement set is Set 4, and Set 1 and Set 3 should be avoided in the real life. This real application shows the merit of this strategy in proposing the best sensor deployment and its potential application in the field of damage identification.
It is worth mentioning that the verification of the method with a real bridge with different levels of damage (5% damage and 30% damage) is conducted, which shows that the method is robust even for a high Mid-span (1 st cross-secƟon) 2 nd cross-secƟon 3 rd cross-secƟon Fig. 10. a) Locations of the sensors [46]; b) First and second mode shape.  Fig. 2 can be used as a guide for action. The application of this work allows making better use of existing sensor devices and SSI methodologies. Also, it can be useful to identify the main sources of inaccuracy or uncertainty of the results, and thus, helping to put the focus on the aspects to be improved within the SHM + SSI strategy. For instance, the role of the weighting factor in the total accuracy of the results has been identified, thus it can be concluded that it is worthy to further investigate this parameter. By using this tool beforehand, erroneous decisions can be avoided.
The approach does not consider the modeling error, such as the error introduced when making wrong assumptions on the support conditions. In some cases these errors can bring large uncertainty regarding the results and they should be addressed before translating the proposed   approach into practice. The proposed approach can be extended towards this direction. Also, the decision tree can be extended by adding different SSI methods to select the ones providing the most accurate results in each case. Moreover, the development of the SHM + SSI strategy for more slender structures will be conducted in the future. In addition, the operational effects due to traffic in the bridge on the final results were not considered in the present example and are a future line to be explored.

Declaration of Competing Interest
All authors of this manuscript have directly participated in planning execution, and/or analysis of this study.
The contents of this manuscript have not been copyrighted or published previously.
The contents of this manuscript will not be copyrighted, submitted, or published elsewhere.
There are no directly related manuscripts or abstracts, published or unpublished, by any authors of this manuscripts.
No financial and personal relationships with other people or organizations than can inappropriately influence our work.
The example presented in Appendix- Fig. 1 is analyzed to illustrate Eqs. (1) to (5). This structure is composed of 2 elements and 3 nodes. One single mode of vibration is studied. Therefore, the size of the matrix of coefficients of the system of equations is (3N N − N B ) × (3N N − N B ). The structure has the vertical and horizontal displacements restrained at nodes 1 and 3, that is, N B = 4. Further, the horizontal displacement of this structure could be neglected. In this structure, the consistent mass matrix formulation has been used. Then, for each structural element j the mass matrix depends on the total mass of the element m j and on its length L j . For this example, Eq. (1) would be as follows;