Effect of Model, Observer and their Interaction on State and Disturbance Estimation in Artificial Pancreas: an In-Silico Study

The state and disturbance estimations are an indispensable part of the state-of-the-art model-based controllers related to the artificial pancreas, supporting the decision-making and self-tuning of the algorithms. They are not just important when state-feedback kind of controller structure is applied, but also play a crucial role in the estimation of, for example, the amount of acting drug (insulin) in blood or meal intake estimation which has determining role in the short and long term effectivity of the given therapy. This information is also important for physicians to support them in knowledge-based decision-making to be sure a given therapy or device works well. This article compares three observers – a linear-parametervarying (LPV) dual Kalman filter (KF), a LPV joint KF, and a nonlinear sliding mode observer (NSMO) – designed with two individualized models – Hovorka and Identifiable Virtual Patient model (IVP). The article also statistically quantifies the effect of the observer algorithm and model structure on the accuracy of the estimation of plasma insulin, rate of glucose appearance, and glucose. Data for the analysis was generated by the UVa-Padova simulator. Results indicated that, for the rate of glucose appearance and the plasma insulin, the type of model and the observer structure explain less than 10% of the variability in the error, while the inter-patient variability contributes to the error more than 50%. This reveals a limiting factor in the estimation accuracy that might be improved by model parameter adaptation.

The most well-known of the observers in the field are the different realizations of the Kalman filter (KF) [6]; their performance is evaluated on different terms (general, fault detection, insulin estimation) in [7]- [9], respectively. Despite the prevalence of KFs, other observers have been applied, such as moving-horizon estimators [10], [11], extended-state observers [4], [12]- [14] or sliding mode observers [3], [15]. Among them, sliding mode observers are appealing since they are robust, have a simple structure, and can reconstruct disturbances [3].
A crucial element in the observer design is the model. The medium-complexity Hovorka model [16] has been widely used to design observers for glucose control [2], [17]. However, its large number of parameters could complicate model identification and observer tuning. For this reason, the use of simpler models might be desirable. Extensions of the Bergman's minimal model [18], such as the Identifiable Virtual Patient model (IVP) [19] might be an appealing option in this regard, as shown in [3] and [8]. Necessarily, a price to pay for model simplification is a loss of accuracy in modelling certain physiological behaviours. For example, while renal glucose clearance is considered in the Hovorka model, the IVP does not. Nevertheless, to the best of the authors' knowledge, no study has yet evaluated whether the larger flexibility offered by the Hovorka model in modelling complex behaviours has a notable impact on the observer performance, compared to other simpler models.
This article extends the results of a preliminary work presented in [20]. It aims to fill the two above-mentioned gaps through an in-silico analysis with the UVa-Padova simulator [21]. The contributions of the article are the following: • to evaluate the effect of the observer structure on the estimation performance by comparing three observers: two KFs and a sliding mode observer, which is simpler to tune. • to evaluate the impact of model complexity by designing the above observers with two models of different complexity: a model with medium complexity (the Howorka model) and a simpler one (IVP model). The parameters of the models have been identified for the adult cohort in the simulator, and a genetic algorithm-based approach has been employed to tune the KFs. Finally, the individual factors (observer type and model structure) were evaluated with statistical methods such as analysis of variance (ANOVA) and multiple comparisons.
The paper is constructed in the following way. First, Section II describes the Methods. The applied models are introduced in Section II-A, theoretical background of the identification in Section II-B and sensitivity analysis in Section II-B2, followed by the explanation of the observers in Section II-C and II-D. Results about identification and the statistical analysis are provided in Section III. Finally, Section IV closes the paper with the conclusions.

A. MODELS DESCRIPTION
One goal of this research is to determine how the model complexity affects the performance of the observer. To this end, the well-accepted Hovorka model in [16] was compared to IVP presented in [19]. For convenience, a brief description of the models is provided in this section.

1) Identifiable Virtual Patient Model
The IVP model [19] is a compromise between Bergman's model [18] and the Hovorka model regarding structural complexity and accuracy. Its equations are described below: I EF F (t) = −p 2 · I EF F (t) + p 2 · S I · I P (t) where I SC (t) and I P (t) represent the subcutaneous and plasma insulin concentrations ( mU/L), respectively. I EF F (t) is the insulin effect ( min −1 ) and G(t) is the plasma glucose concentration ( mg/dL). The inputs of the model are the subcutaneous insulin infusion u(t) ( µU/min), and the rate of glucose appearance R A (t) ( mg/dL/min) from the meal absorption. Note that the R A (t) is generally unknown due to the complexity of meal absorption dynamics, influencing factors like nutritional composition and alcohol intake. Hence, the R A (t) will be estimated in this work. Lastly, τ 1 and τ 2 ( min) refer to the insulin absorption time constants, p 2 is the kinetic rate for insulin action ( min −1 ), S I is the insulin sensitivity ( mL/µU/min), EGP is the hepatic glucose production, GEZI is the glucose effectiveness at zero insulin ( min −1 ) and C I denotes the insulin clearance ( mL/min).

2) Hovorka model
The Hovorka or Cambridge model is considered a medium complexity model by the AP scientific community. The model consists of 8 dynamic states and a couple of additional algebraic equations if the carbohydrate absorption sub-model is lumped, and replaced with the R A (t) term, as in the IVP model previously described. The dynamic equations are given by [16]: where Q 1 (t) ( mmol/kg) is the glucose content in the accessible compartment, Q 2 (t) ( mmol/kg) is the glucose content in the non-accessible compartment, R A (t) ( mmol/kg/min) is the rate of glucose appearance, is the remote effect of insulin on endogenous glucose production in the liver. S 1 (t) ( mU/kg) is the insulin in the accessible compartment and S 2 (t) ( mU/kg) is the insulin in the non-accessible compartment. Input of the system u(t) ( mU/kg/min) is the insulin infusion. The parameters: k 12 ( min −1 ) is the transfer rate from the non-accessible compartment to the accessible, EGP 0 ( mmol/kg/min) is the endogenous glucose production in case of zero insulin level, k e ( min −1 ) is the insulin clearance. The patient's insulin sensitivity on transport, disposal and, glucose production is determined by the k a1,2,3 ( min −1 ) and k b1,2 ( L/mU/min/min), k b3 ( L/mU/min) parameters. The insulin absorption time constant is denoted by τ S ( min) [16]. Supplementary algebraic equations are defined by: where F 01c ( mmol/kg/min) is the glucose consumption of the central nervous system, while F 01 represents the consumption at ambient glucose concentration, F R ( mmol/kg/min) is the renal excretion of glucose in the kidneys, G(t) ( mmol/L) is the actual output of the system: blood glucose concentration, V G ( L/kg) is the glucose distribution volume [16]. The units of the models can be converted in the following way: 1(mmol/L) = 18(mg/dL) for the blood glucose level, 1(mmol/kg/min) = 180/V g (mg/dL/min) for the rate of appearance, where V g (dL/kg) is the glucose distribution volume. The conversion between the insulin infusion is 1(mU/kg/min) = 1000/BW (uU/min), where BW (kg) is the body weight. Note that the IVP model expresses the variables relative to the volumes, while the Hovorka model, relative to the body weight.

B. IDENTIFICATION OF THE MODELS
The UVa-Padova simulator was selected to identify the models in Section II-A. The simulator received the approval of the Food and Drug Administration to be used as a substitute for preclinical trials with animals [22]. Also, the simulator model could fit glucose profiles from clinical data [23], and its meal model reconstructed the R A with more accuracy than other models in the literature [24]. Finally, it includes a realistic cohort of virtual subjects [25].
We identified two parameter sets: a population value parameter set (or average model) and individualized parameter set. The population value parameter set was determined from the average subject in the simulator. The individualized parameter sets were obtained by identifying two of the most sensitive parameters for each of the 10 individuals in the adult cohort (of the academic version of the simulator), while the remaining parameters were set to their corresponding values in the average model. The identification procedure followed the pathway of [26]: checking the structural identifiability, ranking the sensitive parameters, and identifying the parameters.

1) Identifiability analysis
Structural identifiability defines the possibility of determining under ideal conditions (e.g. absence of noise) a unique value (structurally globally identifiable) or a finite set of values (structurally locally identifiable) for model unknown parameters, with the only information of the inputs and outputs [27]. To analyse the structural identifiability, the generating series (GS) approach was used since it is a tradeoff method between computational cost and provided information regarding other techniques in the literature [27]. The analysis was performed with the GenSSI 2.0 software described in [28].
The inputs considered in the identifiability analysis were the insulin infusion and R A . Since the R A was available for the identification, parameters related to the carbohydrate absorption dynamics -the glucose distribution volume in the IVP model [19] and the parameters D G (amount of carbohydrates digested), A G (carbohydrate bioavailability), t max G (maximum time for the rate of glucose appearance) in the Hovorka model [16] -were excluded from the analysis. In addition, the weight BW in the Hovorka model is known, thus it was neither considered in the identification.

2) Sensitivity analysis
A global sensitivity analysis was performed with a twofold purpose: 1) to reduce the number of parameters to be identified in the Hovorka average model, and 2) to select the two VOLUME 4, 2016 parameters to be individualized in the personalized IVP and Hovorka models. The same parameters included in the identifiability analysis were considered in the sensitivity analysis.
For the sensitivity analysis, the AMIGO2 Matlab toolbox was utilized [29]. The sensitivity is calculated by changing the parameters which result in various trajectories. Different measures can be applied to these trajectories, the most common ones are the root-mean-square deviation and the mean absolute deviation. In order to get a more robust result, several steps are taken. The simulations of the trajectories are repeated with different initial conditions or input schemes. The sensitivity values are normalized at each time instance for all the parameters. The toolbox utilized the Latin Hypercube Sampling (LHS). The LHS method provides an even distribution of all the parameters in the given range, with a relatively low amount of necessary samples. The parameter bounds were calculated by applying to the nominal values in [30] and [19] a deviation of ±5σ, where σ denotes the standard deviation provided in the referred articles, but saturating to 0 if negative values appeared. The simulation of a given parameter set is only considered in the calculation of the sensitivity if the BG remained in the following range: 1.8 < BG < 25 mmol/L in order to avoid entirely unrealistic trajectories.
If a general nonlinear system is defined in the form of 0 = f (ẋ, x, p, u, t), where p are the parameters, then the sensitivities s p = ∂x ∂p can be given with the Jacobians: ∂sp ∂t = ∂f ∂x s p + ∂f ∂p s.t. s p (0) = 0. The toolbox normalizes with the value of the parameter and with the value of the output in the given discrete step: where S p,k is the normalized or relative sensitivity of the parameter p in the k − th discrete step, y(k) is the output and p i is the i − th sample from the LHS. The ranking of the parameters are based on the final measure defined by: where n e is the number of different setups for experiments (initial conditions or input schemes), n lhs is the number of sampled values of the parameters, n k is the number of discrete points at which the system is evaluated.

3) Identification
On the one hand, in the identification of the average models, all the sensitive and mildly sensitive parameters are considered, as suggested in [26]. The insensitive parameters were set to the nominal parameters in [16] for the Hovorka model, and the means of the parameters listed in [19] were considered for the IVP model. On the other hand, two parameters in the sensitive group were individualized for the personalized models. The remaining parameters were fixed to the corresponding values in the average model. To identify the Hovorka model, the parametrization of the model in terms of insulin sensitivities (S i1 = k b1 /k a1 , S i2 = k b2 /k a2 and S i3 = k b3 /k a3 ) was utilized as in [16].
The identification consisted of two optimization problems: one that identifies insulin-related states by minimizing the normalized root-mean-square error (NRMSE) of the plasma insulin; and another one that identifies the insulin effect and glucose-related parameters by minimizing the NRMSE of the glucose measurement. The NRMSE is defined by where x model represents the glucose or plasma insulin obtained by the IVP or Hovorka models. x U V a denotes the "real" measurements, whereas the superscripts max and min refers to the maximum and minimum values of these measurements, respectively. This two-step optimization procedure targets a better identification of the insulin pharmacokinetics, as discussed in [19]. The inputs to the optimization problem were the glucose, the plasma insulin, the insulin infusion, and the meal disturbance. We generated these inputs from a 3-meal simulation of the average adult patient (for average models) or the adult cohort (for personalized models). Of note, the full knowledge of these inputs is impractical for real applications, but they were used in this study to ensure accurate identification of the parameters. The genetic algorithm (GA) in the Global Optimization Toolbox of Matlab 2018b (ga function) was utilized to solve the above-described optimization problem. The algorithm was configured with its default settings [31].
Finally, the identification of the models was assessed in terms of the NRMSE using, as in the identification, a 3-meal scenario, but with different instances of variability.

C. KALMAN FILTERS
Besides the estimation of the state variables, our goal was to estimate the R A as well. To do so in the case of KFs, the engineer has two main options. The most straightforward and largely applied technique is the augmentation of the state vector with the parameter or disturbance term, giving rise to the joint KF (JKF) [32]. Another possibility is to use separate KF for the state and parameter/disturbance estimation problem; this observer is called the dual KF (DKF) [32]. Both observers hypothesize that the parameters are static: in the prediction phase, the parameter in the next step holds the value of the previous step. For the application of the observers, the models are either discretized by the Euler or Complete method. In addition, the benefits of linear parameter varying (LPV) formulation were exploited in this article to handle non-linearities in KFs design. Consequently, more complex KF-based algorithms (such as Extended KF or Unscented KF) were avoided [7].
LPV modeling technique is an approach to handle the nonlinearities of the given system. If one the parameters is not a free signal such as an inner state variable, then it is called quasi-LPV or qLPV. The continuous time qLPV state-space representation (qLPV-SS) is defined as follows [33]: where the p(t) = [p 1 (t) . . . p R (t)] parameter vector consists of the so-called scheduling parameters Either the IVP or the Hovorka model accept a qLPV-SS representation: • IVP model. Selecting p(t) = G(t) as scheduling parameter leads to the following qLPV representation: where the additional 5th state variable is the estimated disturbance, namely the R A : • Hovorka model. Taking the qLPV representation is as follows: Since the selected parameters are not directly measurable, they have to be estimated by the observer and can introduce additional inaccuracies compared to an LPV model where the parameters are measurable. The additional 9th state variable is the estimated disturbance, namely the R A (t).

2) Applied Kalman filters
Utilizing the benefits of the discretized LPV representation, a linear discrete Kalman filter can be applied to the nonlinear system [34], [35]. In the case of the JKF, the dimension of the covariance matrix is extended by the number of parameters. In this case, because of the supposition of the parameters, the exact discretization method cannot be applied. This is due to the arising singularity issue, thus explicit Euler method is applied: where the parentheses indicate the continuous LPV representations of state-space matrices, while the brackets the discretized ones of the corresponding model. Sample time is denoted by T s , while the discrete step by k.
On the contrary, the DKF can be implemented using the exact discretization given by : As a result of the LPV discretization, the discrete propagation can be expressed in the following state-space form: where u[k − 1] is the integral of the continuous input from (k − 1)T s to kT s . Thex − and P − are the predicted state variables and error covariance, respectively, whilex and P are the posteriori state estimate and posteriori error covariance. The Q process noise covariance matrix is the tuning parameter and the λ is the forgetting factor. In the update phase, the predicted values are corrected with the new measurements: The main differences between the filters are summarized in Table 1, where n is the number of state variables and p is the number of parameters.  (n + p) x 1 n x 1, p x 1 state matrix dim.
(n + p) x (n + p) n x n error covariance matrix dim.
(n + p) x (n + p) n x n, p x p Kalman gain dim.

3) Kalman filter tunings
The study included 4 different KFs. The filters can be categorized by type and utilized model. Henceforth, a specific filter is identified by a consequent notation: the model name (IVP or Hovorka), followed by the abbreviation of the filter type (DKF or JKF). During the tuning process, the virtual patient and the observer utilized the same model, either the IVP or the Hovorka; dissimilarity between them was caused by a fixed +30% variability in all the model parameters. This amount of variability is in physiologically relevant ranges according to [19], [30]. The Q covariance matrix had nonzero elements only in the main diagonal.
To lessen the human factor, and make the comparison between observers fairer, we tuned the Q matrix with Matlab GA. The optimization is done on a 24-hour-long BG trajectory of a virtual patient with parameter variability. The optimization is driven by a cost function where only the transients were considered, as we observed in previous simulations that the filters cannot compensate for the steady-state offset. If the whole trajectory would be taken into account, the optimizer would adjust the transient to reduce the cost function; however, it can result in distorted, unfavourable responses, as shown in Fig. 1. The result of a tuning considering only the transient is denoted by "A", while a tuning taking into account the steady-state also is denoted by "B". Although "B" yields a lower NRMSE compared to "A", the latter one is preferred, since it provides a more realistic waveform.
The elements lying in the main diagonal of the Q covariance matrix were the tuning parameters, also known as genes of an individual. The lower bounds of the parameters were set to zero. The upper bounds were set to a value at least as large, as if this value would be the only non-zero element in the Q matrix, the filter would still converge to the measurements. The numerical results are shown in Table 4.
Since this tuning method utilizes all the state variables, it can be applied only in-silico, due to the non-measurable variables in clinical practice. This was a reasonable compromise to make since our main goal was to provide uniform settings during the investigations. There is a significant difference in the number of state variables between the models, the Dalla Man model (used by the UVa-Padova simulator) has 20 state variables, the Hovorka model has 8 state variables and the IVP model has 4 state variables. There are three equivalent variables between the models (with different units of measurement), namely the BG, Ip, and R A . To achieve that these common variables are presented with the same weights in the cost functions (44)- (45), two coefficients are introduced. By applying the weights 9 5 and 3 5 in (45) error in the BG, Ip and R A yields the same amount of cost in the Hovorka and the IVP model independently of their corresponding system order.
During the tuning process, it has been observed that the GA could overfit the optimization scenario and smallamplitude oscillations could appear or the shape of the trajectory can be unrealistic as shown in Fig. 1. To avoid this artefact, a measure of oscillation is formulated and penalized in the cost function. The penalizing coefficient, x osc , is calculated in the following way: where x is the state vector of the applied models, n is the number of samples, and the thresholds were dependent on the direction changes of the reference trajectory. These numbers were determined in a way, to ensure that the estimated trajectory has the same number of direction changes as the reference. The k penalizing coefficient increases the cost by an order of magnitude, enforcing the GA to find a solution under the threshold. We found that the CGMS noise does not affect significantly the final result, thus the generated measurement data was the true blood glucose values of one of the models. Taking into account the aforementioned factors, the cost functions J IV P and J Hovorka have been developed for each virtual patient model: J Hovorka = k 9 5 (e G + e Ip + e R A ) + 3 5 (e Q2 + e x1 + e x2 + + e x3 + e S1 + e S2 ) where e denotes the NRMSE of the given state variable (see the corresponding notations in [16] and [19] or in Section II-A) and k is the penalizing coefficient.

D. SLIDING MODE OBSERVER
In this research, the performance of two KFs (see Section II-C) were compared to an NSMO, which has a simpler structure and tuning. The NSMO has several appealing features such as finite-time convergence of the measurable states despite matching disturbances or the ability to reconstruct disturbances without assuming that they are constant [36]. These features have motivated the design of sliding mode observers for a wide range of applications in the literature, including anti-disturbance control [37], [38], and fault detection and isolation [39], [40]. The NSMO used in this work was proposed by [36, Section 3.5], which extends the results of [41] or [42] for non-linear models. Shtessel et al. [36] considers a nonlinear system given in the unmeasurable-measurable form: where y(t) ∈ R ny is the measurable output and x(t) := col(x 1 (t), x 2 (t)) ∈ R nx denotes the system state, where x 2 (t) represents the last n y elements of x(t). Plus, u(t) ∈ R nu is the known input; whereas, d(t) ∈ R n d is the unknown input appearing only in the measurable states x 2 (t). Finally, φ 1 (x, u, t) and φ 2 (x, u, t) denote the nonlinear terms; and A 11 , A 12 , D 2 , and C 2 refer to the system matrices with the proper dimension. Notice that the IVP model and the Hovorka model accept the form of (46) as shown below: and y(t) := G(t), the equations (5 -12) can be transformed into (46) with: The equations for the NSMO read as [3], [36]: with v(t) the discontinuous term defined by If the positive gain κ in (48) is chosen sufficiently larger than d(t), then the output error (e y (t) := y(t) −ŷ(t)) will converge in finite time to 0 [36, Chapter 3]. In this work, the maximum value of the disturbance d(t) was set to 30 mg/dL/min after exhaustive simulations under the same conditions as in the KFs tuning. Moreover, the estimation error of the unmeasurable states (e 1 (t) := x 1 (t) −x 1 (t)) will converge to 0 asymptotically. However, no gains applied to the unmeasurable states, which might be affected by modelling errors.
In addition, an estimation of d(t) can be realized by using the discontinuous term [36,Chapter 3]. Once the sliding motion is attained, i.e. e y (t) =ė y (t) = 0, the output error dynamics is described by where v eq is the so-called equivalent output error injection [36,Chapter 3]. Because of the asymptotic convergence of e 1 (t), v eq asymptotically tends to D 2 d(t). As a result, an estimation of d(t) is given by: where D * 2 = (D T 2 D 2 ) −1 D 2 . One of the main drawbacks of the NSMO is the chattering, that is, a zigzag behaviour of the estimated states variables that occurs when the continuous observer in (47)(48) is discretized with explicit methods [43]. To reduce the chattering, Acary et al. [43] presented an implicit Euler discretization technique for control purposes. In [3], that technique was applied to sliding mode observers. However, it required the estimation of the disturbance in the discretization, which reduces the performance. To overcome this problem, the approach of [44] was applied in this article.
The discretized equations of the NSMO, using the implicit Euler method, are given by where v[k] ∈ signm(e y [k]) and signm(·) is the multivalued sign function used in [43]. Although the term v[k] = signm(e y [k]) appears on the right hand of (51), the system is causal. . As an example, consider the implicit discretization equations of the observer (51) for the IVP model in (2 -4) given by: (54) To obtain a causal expression for v[k] two properties of convex sets were applied [43]: with N [−1,1] the normal cone onto the convex set [−1, 1] at point b. • Given a positive scalar M ∈ R and a closed convex non-empty set, C ⊆ R it follows that where proj(C; a) is the Euclidean projection of the point a into C. Finally, the application of the above properties to (56)

E. IN-SILICO COMPARISON 1) General description
The purpose of the comparison is to study the impact on the estimation performance of the following factors: • "Model": the structure of the model was considered by comparing the IVP model (Section II-A1) with the Hovorka model (Section II-A2). • "Observer": the observers were assessed with the comparison of the NSMO (Section II-D) and the KFs (Section II-C3). We performed 6 simulations -one for each combination of models and observers -with the academic version of the UVa-Padova simulator [21], which was extended with added features for intra-patient variability generation. The scenario consisted of a 24-h simulation including three meals (45 g at 7 h, 70 g at 14 h, and 60 g at 21 h) and multiple sources of variability, namely, CGMS error according to the default model in the simulator; sinusoidal-based circadian insulin variation with uniformly-distributed amplitude of ±30%; and, variability of subcutaneous insulin absorption according to a uniform distribution of ±30%. For each of the 10 virtual adults, we repeated the simulation three times with different variability instances.
Remark that when comparing different types of observers, tuning may bias the results. For the fairness of the comparison, as indicated in Section II-C3 and Section II-D, we tuned the three observers for them to perform similarly under the same tuning scenario. To this end, we set the Q matrix of the KFs by optimization, and the disturbance bound of the NSMO by exhaustive simulations.

2) Statistical analysis
We used the root-mean-square error (RMSE), the median absolute error (MAE), and the maximum absolute error (MaxAE) to study the estimation accuracy of the BG, I p , and R A . To avoid influencing the metrics with the initial condition transient, we neglected the first 30 min of the simulation when computing the metrics.
For each signal, a multifactorial ANOVA determined whether the factors "Model" and "Observer" and their interaction explained the variability found in the performance metrics. Since all the simulations shared the cohort of patients and the meals, the hypothesis of independence was unmet [45]. For this reason, we considered as new covariates the factor "Subject" -the identifiers for each of the virtual subjects -together with the interactions "Subject:Model" and "Subject:Observer". When a factor (or interaction) was significant, a pairwise comparison was performed to determine which level (or combination of levels in the interaction) cause the factor (or interaction) to be significant. Given the skewness of the distribution, the sign test was selected to determine if the median difference between the performance metrics of the groups is significantly different from 0 at a 0.05 confidence level [46]. The Benjamini-Hochberg p-value correction approach [47]  1.16 · 10 3 1.13 · 10 3 1.28 · 10 3 1.02 · 10 3 9.31 · 10 2 1.15 · 10 3 1.09 · 10 3 1.09 · 10 3 9.06 · 10 3 1.18 · 10 3 GEZI ( min −1 ) 3.03 · 10 −8 - 2.85 · 10 −2 was applied to control the false discovery rate in the case of multiple comparisons. The ANOVA and sign test only inform whether the observed differences among the levels of a factor are nonrandom (statistically significant difference) or they originated from the randomness of the simulations. However, these tests do not inform about the magnitude of these differences or their practical relevance. To analyse this information, we calculate two metrics: the eta-squared (η 2 ) -which measures the proportion of variance associated with each factor [48] -, and the 95% confidence interval around the median difference (with the correction in [49] to control the false coverage rate).

1) Virtual Patient Model
The analysis of the structural identifiability of the IVP model determines that all the parameters are structurally locally identifiable when the plasma insulin is measured, which is coherent with the results in [19]. The sensitive parameters are S I , EGP , C I , and GEZI; whereas, parameters τ 1 , and p 2 are mildly sensitive. The only insensitive parameter is τ 2 ; its nominal value was used instead of being identified.
Among the sensitive parameters, we individualize S I and C I because they are related to the steady state conditions of all the state variables. Table 2 shows the identified parameters for the 10 virtual adults in the UVa-Padova simulator.

2) Hovorka model
Likewise the IVP model, the Hovorka model parameters are structurally locally identifiable if the plasma insulin is known. The sensitive parameters of this model are V i , k e , S i1 , k 12 , S i2 , F 01 , S i3 ; the mildly sensitive are EGP , τ s and V g and finally, the insensitive parameters are the remaining ones. As stated in Section II-B, the sensitive and mildly sensitive parameters are included in the identification of the average model, while the insensitive ones are fixed to the nominal values in [16]. Parameters V i and S i1 are selected to be individualized, since they are the most sensitive gains of the insulin subsystem and glucose subsystem, respectively. Table  3 includes the identified parameters.

3) Identification assessment
The identified parameters of the average models (Tables 2  and 3) accurately fit the glucose; they achieved an NRMSE of 9.87% for the IVP model and 6.32% for the Hovorka model. Both models achieve a low NRMSE for the plasma insulin. However, the IVP model fits the plasma insulin better (3.69% vs. 6.38%). The main difference between the models is that the Hovorka model has a slower response due to a higher order in the insulin pharmacokinetics model.
Moreover, fit degrades when the average models are used for the different adult subjects in the cohort (see Fig. 2), specially for the BG fit provided by the Hovorka model. The larger degradation of the fit in the Hovorka model might be related to an over-parametrization of the insulin effect subsystem: whereas the simulator model uses two compartments to describe the insulin effect [21], the Hovorka model adds an additional compartment to describe the insulin effect on the glucose transport, i.e, the state variable x 1 (t). Also, it turns out that S i1 is a sensitive parameter which was chosen to individualize the model. This denotes a structural deficiency of the Hovorka model concerning the Dalla Man model. This over-parametrization could cause some overfitting problems, which make the model sensitive to different patients.
Model individualization improves the fit and reduces the difference in terms of NRMSE between both models. No statistically significant evidence exists to consider that the median difference in the NRMSE of the glucose between both models is different from 0, according to the sign test, at a confidence level of 0.05. Model individualization also reduces the variability of the plasma insulin, but the IVP model improves the accuracy of the Hovorka model.
Since the individualized models fit the plasma insulin and the plasma glucose better than the average model, we used the individualized models to design the observers.    4) in terms of RMSE and MAE because the former overestimates the R A after the postprandial peak and underestimate it in the steady-state (Fig. 3). The pairwise comparisons also confirm the superiority of Hovorka-based observers ( Table  6 and Table 7). However, the median difference is 0.04 mg/dL/min for both RMSE and MAE, which might be negligible from the application side.
The observer structure also has a statistically significant effect on MAE (the influence on RMSE is not statistically significant, but it is closed to be it). Specifically, DKF observers overperform NSMO and JKF in 0.04 mg/dL/min in the median for the MAE (Fig. 7) and in 0.05 mg/dL for the RMSE (Fig. 6). Conversely, NSMO and JKF behave similarly, although the NSMO observers have a higher peak  during the transient (Fig. 3), not considered in the calculation of the metrics (see Section II-E2). Unlike the analysis of RMSE and MAE, ANOVA of MaxAE identifies a statistically significant effect of the interaction between the model and the observer ( Table 5). The DKF observer designed with the Hovorka model illustrates the importance of this interaction since it has the largest MaxAE (Fig. 4); it even underperforms the observers designed with the IVP, which differs from the conclusions of RMSE and MAE.
Finally, although the ANOVA identifies the type of model, the observer structure, or its interaction as statistically significant factors, the analysis of the η 2 determines that these factors contribute less than 1% to the variability of the metrics (Table 5). In contrast, "Subject" and "Model:Subject" explain more than 90% of the variability, evincing that no observer technique managed to cope with the large interpatient variability of the simulation. In addition, this result unveils the limitations of the personalized models used to design the observer, even though we identified these models with the knowledge of the plasma insulin and meal rate of TABLE 8. Summary of ANOVA results of plasma insulin. It summarizes three metrics: the root-mean-square error (RMSE), the mean absolute error (MAE) and the maximum absolute error (MaxAE). Terms "η 2 " and "P-value" denote the eta squared effect size measurement and the P-value of the ANOVA F-statistics, respectively. The asterisk (*) indicates a P-Value lower than 0.05.   appearance.

2) Plasma insulin estimation
Model structure and observer type cause remarkable differences in the RMSE, MAE, and MaxAE (Fig. 5). Indeed, ANOVA supports for the three metrics significant differences between the IVP and Hovorka models on the one side and between the DKF, JKF, and NSMO on the other side ( Table  8).
The post-hoc analyses in Tables 9 -11 determine that no significant differences between the NSMO and the JFK exist, which agrees with Fig. 6, where the responses of both observers mostly overlap. DKF observers are the only ones that statistically differ; they lower, in the median, the RMSE in 0.5 mU/L, the MAE in 0.3 mU/L, and the MaxAE in 2.92 mU/L. Since the gain related to the I p in the KFs is near 0 and, it is exactly 0 for the NSMO, the only difference between observers is the discretization method: whereas the NSMO and the JFK are based, respectively, on an implicit and explicit Euler method, the DKF employes a zero-order hold discretization that leads to more accurate results.
The IVP model significantly improves the performance of the estimation compared to the observers designed with the Hovorka model -with median reductions of 0.92, 0.72, and 5 mU/L in the RMSE, the MAE, and the MaxAE, respectively. In Section III-A3, we already observed these differences between models when analysing the fit of the identified models, indicating that the use of the observer could not cope with the deficiencies of the models. The superiority of the IVP is related to a quicker response in the I p dynamics than the Hovorka model (Fig. 6). The slower dynamics of the Hovorka model might be explained by the 2-compartment model that describes the subcutaneous absorption with an equal transfer rate between compartments.    The interaction between the type of model and the observer structure is not significant. This result agrees with Fig. 5, where the DKF always reduces the RMSE compared to the JKF and NSMO, regardless of the utilized model. The type of model contributes more to the performance of the observer than the observer structure does (Table 8). However, as observed in the analysis of the R A , the variability explained by "Subject" and the interactions "Model:Subject" and "Observer:Subject" -67.6% of the RMSE, 68.0% of the MAE, and 62.7% of the MaxAE -is much larger than the sum of type of model, observer structure and interaction -11.1% of the RMSE, 12.2% of the MAE, and 15.7% of the MaxAE. Again, this reveals the limitation of the observers to overcome the inter-patient variability.   Summary of ANOVA results of CGMS. It summarizes three metrics: the root-mean-square error (RMSE), the mean absolute error (MAE) and the maximum absolute error (MaxAE). Terms "η 2 " and "P" denote the eta squared effect size measurement and the P-value of the ANOVA F-statistics, respectively. The asterisk (*) indicates a P-Value lower than 0.05.

3) Glucose measurement
The ANOVA in Table 12 determines that the type of model, the observer structure, and its interaction significantly explain the variance of the RMSE, the MAE, and the MaxAE. Although all the levels in the interaction between the type of model and the observer structure are statistically different, the median differences are, at most, 0.10 mg/dL for the RMSE, 0.07 mg/dL for the MAE, and 0.4 mg/dL for the MaxAE. Such minor differences are negligible from the clinical point of view.
Unlike the analysis of the R A and the I p , the type of model, the observer structure, and its interaction fully explain the variability of the performance metrics -98.7% of the RMSE, 98.9% of the MAE, and 97.6% of the MaxAE. Since glucose is the only signal available online, the observers handle the inter-patient variability, which contributes less than 1% to the variability.

IV. CONCLUSION
We designed three observers (dual Kalman filter, Joint Kalman filter, and non-linear sliding mode observer) based on two glucose-insulin models (Hovorka model and IVP model). The most sensitive parameters of these models were individualized for the 10-adult cohort in the UVa-Padova simulator.
A statistical analysis quantified the impact of the observer structure and model type on the performance of the observers in terms of root-mean-square error, mean absolute error, and maximum absolute error in the I p , glucose, and R A . It was an in-silico analysis; hence simulator accuracy might compromise the results. UVa-Padova simulator includes a realistic cohort of subjects [25], accurately fits the glucose profiles of clinical trials [23], and improves the estimation of the meal absorption compared to other models in the literature [24]. However, like any other simulator, it cannot fully represent all the complex mechanisms involved in the glucose-insulin system. Repeating this study with clinical data would be desirable, but considering the rate of glucose appearance would be unfeasible since its measurement requires elaborate and expensive procedures.
For the I p and R A , the low values of η 2 indicate that the observer structure negligibly impacts the global performance. Although statistically significant differences between observer structures exist, these differences are either too marginal or due to causes beyond the observer structure. For example, the method used to discretize the plant model caused the differences between observers in plasma insulin; the observer type barely contributed to these differences. The type of model influenced the performance of I p and R A estimations more than the observer one. This result is evident in the estimation of the I p where the same advantages of the IVP model, shown in the identification procedure, appeared after designing the observers, regardless of the observer structure.
Finally, the large values of the η 2 for the "Subjects" and "Model:Subjects" factors evince the lack of robustness of the observers against the inter-patient variability. Observers with a more complex structure -for example, moving horizon estimators [10] or observers based on neuro-fuzzy systems [50] -and new strategies to improve the personalization of the models must be explored and compared in the future to study if they increase the robustness.