Incorporating Sparse and Quantized Carbohydrates Suggestions in Model Predictive Control for Artificial Pancreas in Type 1 Diabetes

People with type 1 diabetes (T1D) face the challenge of administering exogenous insulin to maintain blood glucose (BG) levels in a safe physiological range, so as to avoid (possibly severe) complications. By automatizing insulin infusion, the artificial pancreas (AP) assists patients in this challenge. While insulin can decrease BG, having another input inducing glucose increase could further improve BG control. Here, we develop a model predictive control (MPC) algorithm that, in addition to insulin infusion, also provides suggestions of carbohydrates (CHOs) as a second, glucose-increasing, control input. Since CHO consumption has to be manually actuated, great care is paid in limiting the extra burden that may be caused to patients. By resorting to a mixed logical-dynamical MPC formulation, CHO intake is designed to be sparse in time and quantized. The algorithm is validated on the UVa/Padua T1D simulator, a well-established large-scale model of T1D metabolism, accepted by Food and Drug Administration (FDA). Compared with an insulin-only MPC, the new algorithm ensures increased time spent in the safe physiological range in 75% of patients. The improvement is limited for those already well controlled by the state-of-art strategy but relevant for the others: the 25th percentile of this metric is increased from 74.75% to 79.06% in the population. This is achieved while simultaneously decreasing time spent in hypoglycemia (from 0.5% to 0.12% in median) and with limited manual interventions (2.86 per day in median).

which may result in severe complications and, overall, in a reduced life expectancy [1]. T1D care represents de facto a control problem, where the goal is to maintain BG levels in the nearly normal range between 70 and 180 mg/dL [2]. This range is known as euglycemia: lower glucose levels, referred as hypoglycemia, represents an immediate threat to patient health, since they can quickly lead to coma or seizure and even to death if not treated in time; high glucose levels, referred as hyperglycemia, may instead cause micro-and macro-vascular damages if prolonged in time that can result in blindness, foot ulceration, limb loss, kidney failure, and many other pathologies [2].
The most common therapy for managing T1D consists of several daily insulin manual administrations, frequent BG monitoring by fingerstick BG measurement devices, diet, and physical exercise (PE). Although this routine aims to prevent acute complications and to reduce the risk of long-term chronic dysfunctions, its practice is burdensome for patients and harshly affects their lifestyle.
Artificial pancreas (AP) systems are an emerging technology that combines different devices to automatize and optimize insulin administration: a continuous glucose monitoring (CGM) sensor measures subcutaneous glucose concentration; a closed-loop control algorithm, fed by CGM readings, computes the optimal insulin dosing; finally, an infusion pump automatically delivers insulin into the interstitium [3]- [6].
In the first generation of AP established in the last decade, a large spectrum of control techniques has been explored: proportional-integrative-derivative control (PID), described in [7] and later equipped in [8] with a mechanism to mitigate insulin absorption delays (called "insulin feedback"); control law emulating pancreatic beta-cell secretion [9] or medical doctor reasoning [10]; and fuzzy logic [11]. Model predictive control (MPC) technique appears to be particularly suited for glucose control in view of its capability to handle constraints (insulin delivery is bounded to be positive) and to mitigate for insulin absorption delay by employing predictive reasoning. For this reason, MPC was the technique of choice in several applications [12]- [23], as reviewed more in detail in Section II.
In the AP mentioned earlier, glycemia is controlled exclusively by modulating insulin infusion [24]. Insulin induces a reduction of glucose levels, and no other control variable is available to produce the opposite effect. State-of-the-art This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ systems may suggest the ingestion of fast-acting carbohydrates (CHOs) when glucose levels drop (or are predicted to drop) below a certain threshold, but this is done as an emergency measure to avoid potentially fatal hypoglycemic events, not as part of a coordinated control plan. Therefore, state-of-the-art control algorithms have to carefully avoid insulin overdosing, and this limits their ability to aggressively tackle postprandial hyperglycemia [6].
One of the most promising alternative developments of AP is represented by the bihormonal AP [25], [26], which employs a second infusion pump to administer glucagon (i.e., the antagonist hormone of insulin) as an additional mean to control BG levels. This further control input allows the algorithm to stimulate BG rise, thus offering new possibilities for producing a better BG control. Bihormonal AP systems have proved their potential to improve BG control with respect to conventional insulin therapy and to their single-hormonal counterpart in outpatient settings [27]. Despite these promising results, it is still unclear how to take the best advantage of the additional degree of freedom offered by glucagon. One option is to make a more aggressive use of insulin given the possibility to perform glucagon-based correction, but there are concerns on the safety of this approach. Glucagon action stimulates hepatic extraction, that is, the release in the blood stream of glucose stored in the liver in the form glycogen. Unfortunately, the hepatic extraction mechanism may fail in case of depleted glycogen storage (e.g., in due to repeated glucagon stimulation) or may be inhibited by substances, such as alcohol. As a consequence, there is no consensus on whether glucagon should be used only as an emergency safety measure, replacing CHO, or if it can be safely used as a true additional control variable. Furthermore, liquid glucagon forms and analogs that are stable at room temperature have just appeared [28]. Albeit very positive, only early evidences are available on the absence of the typical glucagon side effects, such as nausea, while long-term studies on sustained glucagon administration are still missing [26].
The introduction of a glucose-increasing control input was shown to be beneficial also to BG control in the intensive care unit (ICU), where Sun et al. [29] considered smart modulation of intravenous glucose infusion. Unfortunately, this solution is suited only for hospitalized patients and cannot be adopted in a system meant to be used in daily life.

A. Paper Contribution
Inspired by the rationale of the abovementioned solutions, where a glucose-lowering control input is combined with a glucose-increasing control input, in this work, we propose a control algorithm that resorts to two inputs for controlling the system: insulin and CHO. With respect to state-of-the-art algorithms, CHOs will not be used just as an emergency measure, but rather proactively employed as a second control variable. Smart proactive planning of future CHOs consumption enables a more aggressive tackling of hyperglycemia and, hence, has the potential to grant tighter BG regulation. This approach faces a major challenge, namely, that CHOs cannot be automatically administered (subcutaneous infusion is not an option), meaning that patients have to be heavily involved in the loop, being in charge of manually actuating this control action.
To make this approach practically viable, great care has to be paid on patient's extra burden. For this reason, we designed a control algorithm that only seldom resorts to CHO suggestion, with the possibility of defining "do-not-disturb" zones during night or working hours and to cap the maximum patient actions requested per day. Furthermore, the suggested CHO amount should be quantized among a few values to facilitate the assumption. As an example of the quantization importance, consider that CHO for T1D management comes often in the form of glucose tabs or sugar/gel packets. Each tab and packet have a predetermined amount of CHO, and patients can easily (and reliably) ingest an integer number of these tabs or packets. Ingesting a fraction of a tab or a partial dose of a sugar/gel packet is instead practically cumbersome and prone to errors.
MPC offers a powerful opportunity to embed both sparsity and quantization of the CHO control input in the control algorithm reasoning, by means of ad hoc linear inequality and integer constraints. In fact, Bemporad and Morari [30] proposed a framework for the modeling and control of systems that intertwine physiological laws, logical rules, and operating constraints (the so-called "mixed logical dynamical systems"). By doing so, the optimization problem within MPC will be formulated as a mixed-integer program (MIP).
To build the new MIP-based MPC for AP and to investigate the impact of the CHO suggestion strategy, our starting point is the Pavia/Padua modular MPC [22], [23]. This single hormone, MPC-based AP is one of the most extensively validated in clinical trials [31]- [35], where it proved to be safe and effective. The Pavia/Padua MPC is expanded with the MIP reasoning to create a dual-action MPC that can propose sparse and quantized carbohydrate suggestions. We then compare the performance achieved by the state-of-art algorithm with those achieved by its modified versions, to determine the impact of the new MIP module included without confounding factors.
Many AP systems already embed a safety module that trigger CHO-intake suggestions when patients reach or approach hypoglycemia, to promptly rise BG levels. The CHO-intake suggestions proposed by our novel control strategy are not intended as a substitute to these emergency mechanisms, but rather as a tool for proactively improving BG regulation. As a matter of fact, an emergency CHO suggestion algorithm will be implemented in both the classic Pavia/Padua MPC and in its modified version. This algorithm for emergency CHO intake will be based on the guidelines proposed by the American Diabetes Association and will be discussed more in-depth in Section IV-B.
The comparison between the two control strategies is performed on a large-scale model of T1D metabolism [36]. This tool is well established in the scientific literature and accepted by the U.S. Food and Drug Administration (FDA) as a substitute for animal testing in preclinical T1D therapies evaluation.
While we chose as a starting point the Pavia/Padua MPC, it should be noted that the proposed MIP modification can be included in several other MPC formulations of the AP control problem proposed in the literature, such as in zone MPC [16]- [18] or individualized MPC [16]- [23]. Notably, the proposed MIP extension could also be included in a dualhormone AP, to build a device that can resort to glucagon or, alternatively, to seldom quantized CHO suggestion to increase patients' BG.
Finally, it should also be noted that the problem considered in this manuscript is a human in the loop problem, where the effective orchestration of two control actions, one automatically performed by the control algorithm and one manually implemented by the patient, is needed. The solution proposed extends to other control problems where human intervention to help the controller is possible but costly or burdensome. This is the case in a number of biomedical problems of smart drug delivery, where a clinician, a patient, or another caregiver can be sporadically requested to intervene in assistance of the algorithm.

B. Paper Organization
This article will be structured as follows. Section II-A reviews different MPC formulations for AP systems, while Section II-B focuses on AP systems using both insulin and CHO to control glycemia. Section III focuses on the methods. In Section IV, we present the experimental setup, from the description of the simulator and simulation scenario to the choice of most of the metrics employed to assess the quality of the controllers. In Section V, we discuss the tuning strategy for the hyperparameters of the cost function of MPC. In Section VI, we train and test the newly synthesized control law by comparing its performances with a benchmark, state-ofthe-art MPC. Some final remarks are reported in Section VII.

A. MPC for AP
One of the main aspects that encouraged the adoption of MPC in AP systems is the flexibility that this control technique offers in the problem formulation. Different objectives can be easily considered in the optimization problem. A common choice is to minimize the square distance of BG from a glycemic reference, as done in [21] and [37]- [39] and in the Pavia/Padua MPC considered in this work [22], [23]. Other studies employed instead a zone-MPC strategy, penalizing BG levels predicted to fall outside of a target zone (the normoglycemic range) [13], [16]- [18]. In some studies, the distance from the glycemic reference [19], [20] or the values outside of the target range [19] are penalized by an asymmetric cost function, to account for the higher threat of hypoglycemia on the short term with respect to hyperglycemia. In [40] and [41], the objective is instead to control BG levels toward a desired risk of hypoglycemia.
Glucose prediction uncertainty is accounted in [46] by introducing a trust index to adjust insulin delivery. In [47] and [48], insulin delivery is constrained based on estimations of plasma insulin concentration. In [49], the information on the time of the day and time of the last meal, meal announcement, and sleep announcement is used to make discrete switches between different models to improve BG prediction.
The ability of MPC to incorporate MIMO system was leveraged in bihormonal AP systems [39], [42], where a second hormone is included as a control input; Turksoy et al. [45] propose a multi-input AP exploiting galvanic skin response and energy expenditure to handle PE. Garcia-Tirado et al. [21] also focus on PE management, by proposing an ensemble MPC that is informed by an anticipatory exercise signal computed from patients' exercise records.
Other studies exploited the knowledge that certain inputs, such as meal consumption, are a function of time. For instance, in [37], [38], and [50], the reference signal for BG concentration includes a "learning" term that is updated every day based on the meal consumption of the previous days.
More and more studies are considering adaptive formulations for MPC. Adaptation is key to handle intra-and interday patients' variability and can be achieved by recursively updating model parameters [42], [51], therapy parameters [52], or the weights in the MPC cost function [47], [48].

B. AP With CHO Suggestion
As anticipated, the ingestion of CHOs is commonly used in state-of-the-art AP as safety measure to avoid hypoglycemia, identified either by glucose crossing a critical threshold [53], [54] or through predictive models [55], [56].
To the best of our knowledge, the only study to date where CHOs are used proactively to improve glycemic control is the one by Moscardó et al. [57]. In this work, the authors propose a proportional-derivative controller for a bihormonal AP, using either glucagon or CHOs as a counter-regulatory action to insulin. CHOs intake is proposed in multiples of 15 g by quantizing control actions a posteriori and by accumulating residual CHO (details in [57]). The differences with our approach include the fact that quantization is imposed downstream of the computation of the control action, and sparsity is not enforced. On the contrary, we aim to embed both sparsity and quantization within the control reasoning.

III. METHODS
Patient's BG concentration, BG(k), is the variable we aim to control. In free living conditions, AP only disposes of a proxy of patients' BG, that is, the measurement of patients' glucose concentration in the interstitial fluid provided by the CGM sensor, g(k). We aim to keep this variable as close as possible to a suitable target value, g 0 . A key challenge in this regulation problem is that glucose concentration is susceptible to a significant, exogenous disturbance, i.e., meal consumption d(k). Meal consumption can be considered as a measured disturbance, since in the current AP, patient is requested to inform the system about incoming CHO intake (meal announcement), to trigger suitable feedforward control actions. Unfortunately, patient-provided CHO estimate,d(k), is affected by possibly large errors.
In this study, we consider two different strategies to design an MPC for an AP. The first strategy is the Pavia/Padua MPC [22], [23], a state-of-the-art single-hormone approach (singleMPC in the following), where the only variable used as a control input is insulin i (k). Total insulin administration must be constrained to be non-negative, i (k) ≥ 0.
In the second approach, we introduce the use of a second manipulated variable, that is, extra-CHOs intake c(k). Hereafter, we will call this control strategy dual-action MPC AP (dualMPC in short). In this case, extra-CHO intake is also bounded to be non-negative (c(k) ≥ 0), and as aforementioned in Section I, it will also be constrained to be sparse in time and quantized.
By comparing the performance achieved by singleMPC and dualMPC, the impact of the CHO suggestion strategy proposed in this manuscript can be deduced.
The Pavia/Padua MPC is usually embedded in a modular architecture [44], which includes also a safety module that has the authority to reduce or suspend the insulin infusion proposed by the MPC when a hypoglycemic event is approaching. This modular architecture was not introduced in this work, as additional modules operate as confounding factors in the assessment of the CHO suggestion strategy.

A. Benchmark: Single-Action AP
Following [22] and [23], the single-action MPC will be based on a linearized, averaged, and discretized (T s = 5 min) version of the model of insulin-glucose dynamics used in the UVa/Padua T1D simulator [36]. The model is linearized around the steady-state operating point x eq reached with basal insulin infusion (i (k) = i b (k)) and with no disturbance (d(k) = 0). The resulting model can be represented in a statespace form as where the sampling period T s was omitted for sake of readability, and the following hold.
from a patient's specific fictitious equilibrium point, 4)d ∈ R (mg) is the announced meal consumption.
The system in (1) is both stabilizable and detectable. The optimal sequence of insulin infusions is computed by the MPC at each control instant; by minimizing with respect toī (k), the following quadratic cost function: that considers a finite prediction horizon (PH) of N steps. The first term of (3) penalizes the 2 -norm of the deviation ofḡ from the set pointḡ 0 = g 0 − G b . The variable g 0 is set to 115 mg/dL, except for an increase to 160 mg/dL for 90 min after a meal consumption to avoid insulin overdosing.
The second term of (3) is introduced to discourage large deviations of the control actions from the suitable reference insulin signalī 0 (k). This signal is obtained starting from the patient manual therapy i manual (k) that consists of the basal insulin infusion, i b (k), plus a meal bolus, i meal (k) The signal i meal (k) is always zero, except at meal time when it takes the value [58] where CR is the CHO-to-insulin ratio, a subject-specific parameter representing the number of grams of CHO that are covered by 1 U of fast-acting insulin; CF is the correction factor, another subject-specific parameter defining the glucose drop due to the administration of 1 U of insulin. The signal IOB(k) is the insulin-on-board, that is, the quantity of insulin still active in the organism, computed as described in [59]. Before using the manual therapy insulin i manual (k) in the cost function (3), it has to undergo the same normalization used forī(k) in (2) Therefore,ī 0 (k) is always zero except at meal times when, to promote a meal compensation, a suitably normalized meal bolus is provided as a reference insulin signal to the MPC. The MPC has the authority to modify insulin at any time, including at meal time when it can manipulate the standard meal bolus by increasing or decreasing it, based on its predicted impact on the PH.
The third term of (3) represents a terminal cost, with ||x(k + N)|| 2 P =x(k + N) T Px (k + N). The weights q, r ∈ R in (3) determine the trade-off between the different penalty terms in the cost function. P ∈ R 13×13 depends on the other two and is computed as the unique nonnegative solution of the discrete algebraic Riccati equation where I is the identity matrix. It is easy to see that the optimization is impacted only by the ratior = r/q, which will be tuned specifically for each patient. This patient-specific tuning (called individualization) is discussed in Section V. The state vector at the current instant,x(k), is estimated via Kalman filter (KF), by employing the available information about CGM reading and past insulin infusion and CHO intake. Further details are provided in Appendix I or in [23]. With respect to [22] and [23], here, the MPC optimization problem accounts for the constraints on the minimal and maximal dose of insulin that can be administered at each sample time due to the actuator physical limits is equivalent to a quadratic program with suitable matrices H s and h s [22], and where the variable u(k) is a vector in the form and Following the MPC receding horizon policy, only the first action of this optimal control sequence is applied to the plant. The optimal sequence is then computed again in the following control instant.
All the numerical values used for the controller parameters are reported in Table I.

B. Dual-Action AP
With respect to our benchmark, the dual-action MPC will introduce a new control variable c(k) ∈ R (mg) in the model, representing MPC-suggested CHO intake The variable c(k) is a function of two Boolean support variables with c 1 , c 2 ∈ {0, 1} and γ 1 (mg), γ 2 (mg) suitable CHO quantities. Further constraints will bound only one of the Boolean variables to be active at the same time. Such a formulation quantizes extra-CHOs intake c(k) to three possible values: 0, γ 1 , or γ 2 . Here, we set γ 1 = 1000 mg/min and γ 2 = 2000 mg/min (thus corresponding to an ingestion of γ 1 · T s = 5 g and γ 2 · T s = 10 g). This formulation is easily adaptable, and an arbitrary number of CHO levels, N bool , can be obtained by introducing new auxiliary Boolean variables It should be noted that increasing the number of Boolean variables N bool increases for solving a mixed-integer quadratic program (MIQP) in worst case scenarios. Nevertheless, the time needed to solve the problems in standard instances is more affected by their overall structure of the problem than by the number of Boolean variables, that is often a poor indicator of their practical difficulty. An analysis of the computational cost is reported in Section VI-A.
The new cost function for the dual-action MPC will be Equation (12) penalizes the use of c(k) by introducing two penalty terms on the norms of c 1 (k) and c 2 (k).
Since c 1 (k) and c 2 (k) are Boolean ∀k, by setting s 1 = γ 2 1 · s and s 2 = γ 2 2 · s, for some s ∈ R + 0 , the resulting penalty corresponds to a standard penalty on the 2 -norm of c(k), which discourages the suggestion of large amount of CHO in the PH.
Alternatively, if s 1 = s and s 2 = s, the resulting penalty corresponds to a penalty on the 0 -"norm" of signal c(k) in the PH, that is, the number of non-zero values of c(k) in this time window. This penalizes the number of CHO suggestions, independently of the amount of CHO of these suggestions. By resorting to Boolean variables, the 0 -"norm" of c(k) is reformulated as a linear combination of two 2 -norms, and it becomes practically solvable with optimization solvers.
In this article, we opt for the second option, since we want to penalize the number of times the MPC requests a patient intervention more than the total amount of CHO ingested.
Analogously to the single-action MPC, it is easy to see that that the optimization is impacted only by the ratios r = r/q ands = s/q. In Section V, we will tune these hyperparameters.
Beside promoting sparsity of extra-CHO intake in the PH with the 0 penalty, we enforce a minimal distance s between two consecutive extra-CHO suggestions, to prevent multiple closed-in-time requests of patient intervention. This is achieved by employing inequality constraints on the Boolean variables c 1 and c 2 in a time window of s . Let us consider a single Boolean vector bound by the following linear inequality constraint: The first inequality of (13) can be rewritten as Since the elements of C i are Boolean, at most one c i (k + j − 1) in the interval j ∈ [1, s ] can be equal to 1. The second inequality poses a similar constraints on the interval j ∈ [2, s +1]. The only possible combination of two non-zero actions satisfying both constants is the one that has c i (k) = 1 and c i (k + s ) = 1. This consideration holds in the whole horizon, so that two elements of C i can be equal to 1 only if they are at least s elements apart. In our dual-action MPC, this sort of constraints is posed simultaneously to both Boolean support variables and it can be written in a matrix form straightforwardly extending (13). Due to the receding horizon policy, the abovementioned constraints restrict only the planned CHO, not the ones that are actually consumed. As an example, suppose that the optimal solution at time k −1 includes one extra CHO at the beginning of the PH. The planned extra CHO is consumed at time k. Then, at time k, a new optimization problem is formulated, and a CHO suggestion at the beginning of the new horizon would not be excluded by (13). To exclude this possibility and add memory of previously consumed CHO actions, we include another (time-varying) constraint, forcing all Boolean variables to be zero at less than s samples from previously consumed CHO. To this purpose, denote with k last the time of the last extra-CHO suggestion consumed and consider the timevarying constraint ⎡ Similarly, extra-CHO suggestions are turned off for m samples after the last meal announcement, that we assume to occur at time k meal last . This can be easily included in the previous set of constraints by defining Finally, we include two other practically relevant features in our framework.
The first feature allows to bound the maximum number of extra-CHO suggestion in one day, forcing it to be smaller than n max . This is done through a the time-varying constraint where n res (k) is the residual number of possible extra-CHO suggestion for a certain day, a variable that is reset to n max every day at 06:00 A.M. and decreased of one unit every time an extra CHO is suggested.
The second feature allows the introduction of "do-notdisturb" time portions, where the dual-action MPC is not allowed to suggest extra CHO. As an example, in this work, a "do-not-disturb" portion is set from 00:00 to 06:00 A.M. every day, so that the patient is not requested to perform manual control actions during the night. This can easily be achieved by forcing n res (k) to be zero during these time portions.
In conclusion, defining the optimization variable u(k) as the new optimization problem becomes with H d and h d are suitable matrices straightforwardly representing, in a matrix form, the cost function (12) and Again, for the receding horizon policy, we only apply to the plant the elements of u o (k) corresponding toī(k), c 1 (k), and c 2 (k). The optimal sequence is then computed again in the following control instant.
To minimize the risk of omitted CHO ingestion, the patient is requested to confirm the consumption of c(k) = γ 1 c 1 (k) + γ 2 c 2 (k) suggested by the controller. If the controller does not receive the confirmation, the system will repropose the CHO ingestion until the CHO consumption has been confirmed. At difference with meals, confirmed CHO ingestion in response to a controller suggestion does not trigger an insulin bolus. If the patient feels she/he will not be in the condition for eating a suggested CHO, she/he can set a temporary "donot-disturb" zone.
Finally, it should be remarked that the MIQP solver is granted to find the global optimum, provided that sufficient time is granted for the search. Although the preliminary results in Section VI-A are reassuring regarding the possibility to solve the optimization problem within a fraction of the sampling period, backup mechanisms have to be envisioned to handle the cases in which this should not happen. A straightforward yet appealing option is the use of the control action computed by singleMPC or by an unconstrained version of its optimization problem, saturated a posteriori. This former possibility, because of the existence of a closed form solution, is ensured to have minimal computational cost.

A. UVA/Padua T1D Simulator
The control algorithms we described in Section III are tuned and validated in silico by means of the UVa/Padua T1D simulator. This simulator consists of 13 nonlinear differential equations, accurately describing the metabolism of a T1D subject. These equations contain more than 30 parameters, and to capture the large variability in glucose dynamics among the individuals with T1D (inter-patient variability), the simulator provides 100 different realizations of the parameters  II   MINIMAL AND MAXIMAL VALUES OF THE POSSIBLE CHO AMOUNT AND  TIME OF CONSUMPTION FOR THE DIFFERENT MEALS set, thus enabling the simulation of 100 "virtual patients." The parameters realization is sampled from a joint distribution inferred from unique clinical data [60], therefore respecting the correlation among the parameters occurring in real subjects. Moreover, to account for the fact that even in the same individual glucose dynamics change over time (intra-patient variability), a subset of these parameters are time-varying. Because of its reliability in describing the T1D metabolism, a first version of this large-scale computer model was accepted by the FDA (2008) as a substitute to animals trials in preclinical testing of closed-loop control algorithms. In this study, we employ an updated version of the FDA accepted UVa/Padua T1D simulator S2013 [60], which presents several additional features to increase the realism of the testing scenarios: a model of the so-called "dawn phenomenon," suboptimal patients' therapy parameters, and models of patients' behavior and CGM sensor noise [61].

B. Simulation Scenario
We simulated multiple days of use of both the state-of-art MPC and the dual-action MPC. Simulations start at 00:00 A.M.
Each day of simulation is characterized by the consumption of three meals: a breakfast, a lunch, and a dinner. Times and quantities of meal consumption vary between subjects and days. They are drawn from uniform distributions reported in Table II, whose values are chosen to match the range of the data reported in [62]. Announced mealsd(k) are also affected by the carb-counting error (CCE) committed by patients, which means that the controllers have only access to inaccurate information about the real CHO content of the meal. In this version of the simulator, CCE is modeled according to [63].
To increase the reality of the simulations, we set suboptimal values of CR and i b (k). These suboptimal values are computed following clinical guidelines.
We ensured that all realizations of random factors described so far are equal when comparing different control algorithms, using the same seed in the random number generator for each patient in every different simulation.
In clinical practice, whenever the controller fails to avoid hypoglycemia, the patient eats rescue carbs (RCs) to quickly restore safe glucose levels. This safety intervention, called hypotreatment (HT), is included when simulating both singleaction and dual-action controllers. Therefore, in the following, we will distinguish between RCs for HT and planned carbs (PCs) proposed by the dual-action MPC. Details on the HT generation mechanism can be found in [63], and here, we only mention that they are triggered when CGM reading drops below pre-established hypoglycemia threshold levels.

C. Evaluation Metrics
Given the large number of studies in AP systems, a panel of international experts standardized in a consensus meeting the metrics to be adopted for AP system evaluation [64], [65].
We also consider the so-called area under the curve in hyperglycemia (AUC H ), defined as the area between the glucose trace g(k) and the hyperglycemic threshold BG H = 180 mg/dL with N sim being the number of simulated CGM samples. With respect to the percentage of time spent in hyperglycemia, which considers only the duration of the hyperglycemic episode, this metric also accounts for the magnitude of the hyperglycemic excursions.
Analogously, we consider the area under the curve in hypoglycemia (AUC h ), defined as , 0))T s with BG h = 70 mg/dL being the threshold for hypoglycemia.
Further relevant input-related metrics are the total daily insulin (TDI) administration and the number and amount CHO-intake suggestions. Suggestions are distinguished as RCs, PCs, and total CHO (TC) intake. These latter metrics are useful to evaluate the burden of our novel therapy on patients' lifestyle. A summary of these metrics is reported in Table III. For each of this metrics, we report the median and the interquartile range computed over the specific pool of subjects under exam. The statistical significance of possible improvements obtained with dualMPC, with respect to singleMPC, will be evaluated by means of a paired-sample t-test or a Wilcoxon signed rank test, depending on whether the populations are Gaussian or not. Either way, we will set the test significance level to 5%. The Gaussianity of the distributions is evaluated by means of a Lilliefors test with a significance level of 5%.

V. ALGORITHMS TUNING
In this section, we discuss the tuning of the hyperparameters of the control algorithms. Specifically,r in (3) remains to be chosen for the singleMPC, whereas in the dualMPC, there are two hyperparameters:r ands in (12). These hyperparameters regulate how aggressively the algorithm uses each input.
Because of the significant variability in each persons' metabolism, finding a fixed value for these hyperparameters In silico, hyperparameters individualization can be done by testing several possible points in the hyperparameter space and choosing the one that achieves the best control performance (assessed by a suitably defined scalar function). Unfortunately, this approach cannot be performed in real subjects, as some of the hyperparameters tested in the exploration could lead to unsafe BG levels, hence posing patients at risk.
To overcome this difficulty, Patek et al. [44] showed that the optimal value of the hyperparameter for in-silico subjects can be inferred via statistical regression from other subjects' parameters, such as BW or CR. Since these parameters are also available in real patients, the regression law trained in silico can then be used to compute effective personalized hyperparameters on a real subject.
To implement and evaluate this approach, we proceed as follows.
1) The cohort of 100 virtual patients is split in a training and a testing set (50 subjects each).

2) Training Phase:
a) The optimal hyperparameters are obtained on the training set by means of a hyperparameters optimization algorithm. b) The optimal weights are used to infer regression models that link them to patients' physiological and therapeutics values.

3) Testing Phase:
a) The regression models are used to compute suboptimal hyperparameters for the subjects in the testing set. b) The control achieved with these hyperparameters is evaluated on the subjects in the testing set. The control performance achieved is discussed in Section VI, both for the training and test phases. The detailed descriptions of the tuning phase and the description of the regression model can be found in Appendixes II-A and II-B, respectively.

A. Computational Cost
The optimization problem (15) was solved using MATLAB and ILOG CPLEX Optimization Studio (v. 12.9) [66]. On average, the time needed to solve the hardest instances of the problem (i.e., when CHO intake was allowed in any instant of the PH) was <1 s on a desktop PC, which is two orders of magnitude smaller than the sampling time T s = 5 min. The solver was always able to find an optimal solution.
Since the size of the problem (as a number of Boolean variables) is often a poor indicator of its practical difficulty, we tested the impact of an increasing number of Boolean variables N bool on the computational cost. Table IV reports the average, the standard deviation, and the maximum of the time needed for solving the hardest instances of the optimization problem. The standard deviation and the maximum of the computation time have an increasing trend, as N bool raises. On the other hand, the average computational time for N bool = 5 is lower with respect to all other cases, which confirms that increasing the size of the problem does not necessarily cause an exponential increase in the computational cost.
This analysis, that reporting computational time on a desktop PC, has to be considered a preliminary assessment. The computational time on a less performing hardware, better matching those used in an AP, remains to be evaluated with ad hoc implementation. Table V reports a comparison of the results obtained in the training phase with the two control algorithms. This comparison is performed on two days of simulation on the 50 patients of the training pool.

B. Performance in Training Phase
The optimal tuning of the parameters allows to achieve an improvement in every CGM-based metric with dualMPC with respect to singleMPC. Mean glucose is closer to the target of 115 mg/dL, and glucose variability is reduced, proving tighter glycemic control. Both the median value and the inter-quartile range of the TIR are improved (i.e., increased) with dualMPC. Similarly, the medians and inter-quartile ranges for all the other glycemic intervals, as well as AUC H and AUC h , are improved (i.e., decreased) by dualMPC.
This tighter glycemic control is achieved with a slightly more aggressive use of insulin, as highlighted by the larger TDI administration in dualMPC (median values from 38.88 to 39.83 U, about +2.5% median increase). Nevertheless, the average number of daily CHO-intake suggestions is acceptably low. The dualMPC algorithm suggests one CHO intake per day on median, whereas the 75th percentile is three suggestions per day. The amount corresponds to 10 g per day on median and a 75th percentile of 27.5 g. If we also consider the total intake of extra carbs (RCs + PCs), the median for dualMPC is 1.75 suggestions per day (or 26.25 g), whereas the median for singleMPC is 0.5 suggestions per day (7.5 g). These positive results shown in Table V are also supported by parallel coordinate plots of the time spent in hypoglycamia and euglycamia, as reported in Fig. 1. The hue of the lines in the parallel coordinate plot depends on the relative difference in performance obtained with dualMPC with respect to singleMPC. If time spent in hypoglycemia is reduced or time spent in euglycemia is increased for that patient, the line is green. Otherwise, the line is colored in red shades. This analysis highlights how the improvement seen in Table V hold for the large majority of the subject, with time in euglycemia worsening for only 1/50 subject. Hypoglycemic events are instead reduced for all the 50 subjects. Both improvements were found statistically significant ( p-value < 0.001).

C. Performance With Inferred Hyperparameters
In this section, we compare the performances of the two control strategies on the subjects of the test set for seven days of simulation. Both singleMPC and dualMPC are tuned with the suboptimal hyperparameters obtained from their respective regression models, a procedure that can be performed also in real subjects. This testing phase represents, therefore, an insilico evaluation of efficacy of the personalized controller on real subject. Table VI reports a comparison of all the metrics listed in Section IV-C for the two algorithms. As expected, there is a degradation of control performances of both controllers with respect to the results obtained on the training population in Section VI-B. For what it concerns the comparison singleMPC versus dualMPC, the CGM metrics are improved with the latter strategy, consistently with the results in Section VI-B. Specifically, the median value and interquartile range of TIR are better (i.e., higher) with dualMPC with respect to singleMPC. These results are achieved while simultaneously improving (i.e., reducing) TBR, in a statistically significant way ( p-value < 0.001). Similarly, the median and interquartile ranges of TBR, TAR, TDh, TDH, AUC H , and AUC h are better (i.e., lower) with dualMPC. Except for TAR, these improvements are also statistically significant.
Parallel coordinate plots of TIR and TBR for the testing phase are shown in Fig. 2. Most of the testing population (>75% of the patients) achieve better BG regulation with our dualMPC algorithm. Nonetheless, it should be noted that the fraction of patients that do not benefit of the dual action increased with respect to the optimal hyperparameters of Section VI-B.
The suboptimal tuning also leads, in median, to a more aggressive use of insulin: 40.9 U with singleMPC and 43.8 U with dualMPC, both larger with respect to the training phase (38.88 and 39.83 U, respectively). This means that, in the testing phase, dualMPC uses +7% more insulin than the singleMPC (the increase was only +2.5% in training).
Similarly, the use of CHO is increased with respect to the training phase. The median number of total consumption is 2.86 per day, compared to a median of 0.64 with singleMPC. Despite the significant increment with respect to the training phase, their number is still acceptable: the 75th percentile is four extra CHO per day (3.29 of which are suggestions triggered by MPC). The median quantity of extra CHO consumed amounts to 12.5 g with singleMPC and 32.86 g with dualMPC. Fig. 3 shows a window of three days for a representative virtual patient (subject #54). The three subplots in Fig. 3 show CGM signal, insulin infusion, and CHO intake, respectively, for singleMPC (pink dashed lines) and dualMPC (blue solid lines). The different types of CHO intake are represented with different markers: black circles for meals, triangles for PCs, and crosses for RCs.
In every meal of this time window, meal-time insulin boluses and postprandial insulin infusions are higher with dualMPC, resulting in the decreased postprandial hyperglycemic peaks with respect to singleMPC. Notably, this is achieved while simultaneously decreasing the occurrence of hypoglycemia. For instance, in this portion, the dualMPC faces only one short (5 min) episode of hypoglycemia (nadir: 65 mg/dL) at 23:00 of day 5, whereas singleMPC faces two episodes: a longer and more severe one, starting at 22:50 of day 5 and lasting 45 min (nadir: 45 mg/dL), in spite of the reduced insulin administration and the intake of RCs; a shorter one, lasting for just 5 min, at 10:05 of day 7 (nadir: 69 mg/dL). It should be noticed that in both cases, the two algorithms use the same quantity of CHO (20 g in both the cases) to contrast the hypoglycemic episode, but the dualMPC, because of an optimal planning of CHO consumption, is more effective in the task.

D. Robustness Tests
Since part of the control actions have to be manually actuated, it is important to investigate the impact of different levels of patient compliance. To this purpose, we tested different delays in consuming the suggested dose of CHO and investigated the case of missed ingestions. Furthermore, we test the control performance achieved in case of missed meals announcement, that is, when a subject does not inform the controller about an upcoming meal, preventing the delivery of feedforward insulin bolus and, thus, complicating  the compensation of this disturbance. Finally, a preliminary analysis aimed at assessing the impact of PE was performed. 1) Delayed CHO Intake: The impact of increasing delays in the CHO ingestion was tested by introducing a systematic delay between when the CHO suggestion was issued by the controller and when the CHO was actually eaten by the patient. Table VII reports the results for TBR and TIR obtained without delay (dualMPC) and with a delay of 20 min (delay20), 40 min (delay40), and 1 h (delay60).  results obtained with singleMPC. These results are obtained on patients of the test set in a seven-day simulation. A certain level of performance deterioration is clearly expected, and the large variability complicates the interpretation of the data, but the results show that the tested levels of incompliance do not cause major control performance deterioration nor pose the patient at risk.
Median TBR gradually increases, from 0.12% with dualMPC to 0.22%, 0.37%, and 0.52% with delay 20, 40, and 60 min, respectively, and similar increasing trend is observed in the 75th percentile. With a delay of 60 min, the performances overlap with those of the singleMPC. The delay impact seems smaller on TIR, where the improvement with respect to the singleMPC on the 25th percentile seems to persist.
2) Suggested CHO Not Eaten: A major concern is what can happen if the patient does not ingest the CHO required by the MPC algorithm. It should be acknowledged that this situation is a non-negligible source of risk that should be avoided as much as possible, first of all by proposing the dualMPC strategy only to highly compliant patients and by including several risk-mitigation measurements. One of them is the "do-not-disturb" zones mechanism: if a subject does not feel in the position to follow the algorithm suggestion, she/he can set up a period of time in which the algorithm is not allowed to produce CHO suggestions. A second mechanism to minimize accidental omission is to request to the patient to confirm the CHO ingestion. If the controller does not receive such an acknowledgment, it will repropose the CHO. Still, a fraction of the CHO suggested by the MPC may be not consumed. To investigate this condition, we simulated increasing probability of missing a CHO intake: 5% (miss5), 10% (miss10), and 20% (miss20). The simulation was performed on the test set subjects and lasted seven days. The results obtained with these simulations are reported in Table VIII, together with the results obtained with the dualMPC (0% missed suggestion) and singleMPC.
Also, in this case, a performance deterioration is expected, but the result in Table VIII suggests that the tested levels of incompliance do not cause a major deterioration of BG control nor pose the patient safety at risk. A progressive performance degradation seems to emerge, although the large variability complicates the interpretation of the data. In fact, the median TBR and the 75th percentile of this metric increase, as the probability of missing the CHO intake increases. When 20% of the suggested CHOs are not ingested, the median TBR of dualMPC surpasses the one achieved with singleMPC (although the 75th percentile remains lower). Also, in this case, TIR seems to be less affected by the disregarded CHO suggestion.
3) Unannounced Meal: As the majority of AP that reached the market, both singleMPC and dualMPC require the user to inform the controller about meal intake (the so-called meal announcement). The announcement triggers a feedforward control action (through the termī 0 and discussed in Section III). When this information is missing, the performance of the controller deteriorates.
To test the performance of the two controllers, we run a simulation in which meal announcements were forgotten with a probability of 20%. The simulation lasted seven days and involved the 50 subjects of the test set. Table IX reports the results achieved with singleMPC and dualMPC in this scenario.
The dualMPC performs better than singleMPC for what it concerns time in hypoglycemia: TBR went from 0.37%

4) Physical-Exercise-Like
Perturbation: PE can be beneficial to patients with T1D, and this is why it is often part of their therapy. In healthy individuals, PE has a hypoglycemizing effect due to the increased insulin-mediated glucose uptake in skeletal muscles and increased insulin sensitivity [67]. This effect is counteracted by the autonomous suppression of insulin secretion and by an increased activation of counterregulatory hormones. On the other hand, patients with T1D have an impaired hormonal counter regulation, and modulating the suppression of insulin infusion to match PE-induced glucose depletion is no easy task. For this reason, proper regulation of BG levels and avoidance of hypoglycemic events during and after PE are the main challenges of AP systems.
Unfortunately, the effect of PE is not rigorously included in the UVA/Padua simulator. As a preliminary investigation of its impact, we emulated PE glucose lowering effect by applying an extra-insulin delivery of +100% of the patient basal insulin. This perturbation was applied for 2 h, between 4 sc p.m. and 6 P.M. of the second day of simulation, and without announcing this increase to the controllers. The simulation lasted two days and involved the 50 subjects of the training set. Table X reports the results obtained with singleMPC and dualMPC in this scenario. By suggesting CHO consumption, dualMPC is capable of reducing the time spent in hypoglycemia with respect to singleMPC: in terms of

VII. CONCLUSION
The state-of-the-art, single-hormone AP systems for T1D treatment use exclusively insulin infusion to control BG levels. In these systems, CHO intake is suggested only as an emergency measure to avoid hypoglycemic events. On the other hand, the experience with bihormonal AP suggests that combining insulin infusion with a glucose-increasing control input, such as glucagon, can significantly improve BG regulation.
In this work, we propose a MPC algorithm for AP that uses both insulin infusion and proactive planning of CHO consumption to control glycemic levels. The second control input, that serves as the glucose-increasing input of this strategy, has to be manually actuated by the patient. Therefore, a major concern in this study was to minimize the burden of this therapy and the possible errors that patients may commit following it. We satisfied these requirements by constraining CHO-intake suggestions to be sparse in time and quantized. This was made possible by reformulating the control problem as a mixed-integer quadratic programming problem and by introducing ad hoc linear inequality constraints.
The new algorithm is obtained starting from the Pavia/Padua MPC, an MPC formulation for AP that has proved safe and effective in real-life clinical testing. The Pavia/Padua algorithm was expanded with the proposed MIQP extension.
By comparing the original algorithm and the new expanded one, it was possible to elucidate the impact of the new CHO suggestion module without confunding factors.
This comparison was performed on the UVa/Padua T1D simulator, an accurate large-scale simulator of the metabolic system in type 1 diabetic subjects. This tool was accepted by the FDA as a replacement of preclinical animal trials. In silico, the novel strategy leads to a statistically significant and almost systematic improvement of glucose regulation with respect to state-of-the-art single hormonal systems. Furthermore, the results show that the algorithm employs a limited number of CHO-intake suggestions (2.86 suggestions per day in median, equivalent to 32.86 g).
Nevertheless, one should note that this therapy still implies an increased commitment by patients, who are required to undertake some further actions along the day to improve their glucose control. Whether this could be perceived as an acceptable trade-off or not should be the object of dedicated clinical investigations. It should also be noted that not all patients might be willing or be allowed to resort to extra CHO (e.g., those in strict dietary regimens).
Further work includes an evaluation of the computation time requested for the solution of the MIQP problem on a hardware platform comparable with those used in the AP system. Moreover, it should be stressed that the UVa/Padua simulator, while far from being a toy example, remains a simplified testing setup. Final conclusions on the clinical efficacy of the proposed approach can be drown only with suitable clinical investigation.
As a final comment, it should be noticed that, while the MIP formulation has been proposed in this work to expand the Padua/Pavia MPC, the same reasoning could be applied to introduce parsimonious CHO suggestions in other MPC formulation for AP (e.g., zone MPC, individualized MPC, or even dual-hormone MPC).
The proposed strategy allowed to trade-off a certain level of human engagement in favor of a better control, and this strategy could be extended to other biomedical applications (including but not limited to smart drug delivery), where human intervention to assist in the control can be invoked, but such an intervention in burdensome, risky, or costly.

APPENDIX I STATE ESTIMATE
As described in [23], to design the KF, we introduced the model noise v x (k) and measurement noise v y (k) into model (1) We assumed that v = [v x , v y ] T is a multivariate, zero-mean, white Gaussian noise with covariance matrix Following [23], the measurement noise covariance matrix is set to R y = 9.8496 mg 2 /dL 2 . The process noise covariance matrix is a diagonal matrix whose diagonal elements q 1 , . . . , q 13 are reported in Table XI.

A. Optimal Tuning
For what it concerns the scalar function that evaluates the control performance achieved with a certain hyperparameters choice, we used the weighted area outside target range (wAOTR), defined as where the parameters α and β are used to assign different weights to hypoglycemia and hyperglycemia. Specifically, we set α = 10 3 and β = 1 to penalize hypoglycemic excursion more than hyperglycemia, since it is more dangerous for patients in the short time. The factor λ penalizes repeated requests of manual intervention: λ is set to λ = 1 if the average number of daily CHO intakes (including both PC and RC) is smaller than n daily = 5, while is increased λ = 100 otherwise. The hyperparameter optimization was performed via Bayesian search [68] on the 50 subjects of the training set. The algorithm is initialized with 12 random evaluation points, after which the objective function is evaluated 48 times. Fig. 4 reports the boxplot of the distribution of the optimal hyperparameterr for the singleMPC and the dualMPC. This hyperparameter regulates the algorithm aggressiveness in using insulin (the lower the value orr , the more aggressive the controller). As expected, the dualMPC can be more aggressive with insulin because of the possibility of compensating with carbs. Fig. 4. Boxplots of log 10 (r opt ) for singleMPC and dualMPC.

B. Regression Models
The optimal hyperparameters are inferred from easily accessible patient data, such as BW, and therapy parameters, such as CR avg , CF, or i b avg , where CR avg and i b avg are the average of the daily patterns of CR and i b (k), respectively. This is done via least square regression on a subset of these variables.
Let us consider the pth patient in the training set, p = 1, . . . , 50, and let us denote with r s ( p) the optimal value of r for the singleMPC in the pth patient. Similarly, we denote with r d ( p) and s d ( p) and the optimal values of r and s for the dualMPC in the same patient. Finally, the values of CR( p), BW( p), . . . are the accessible parameters of patient p. The following linear regression model between the optimal hyperparameters and the accessible parameters is assumed: With h, we indicate one of the hyperparameters, h ∈ {r s , r d , s d }. The vector φ h ( p) ∈ R 1×n reg contains a subset of n reg regressors in patient p, i.e., a suitable subset of accessible patient data. The subset of regressors varies depending on the considered hyperparameter and was selected via step-wise regression as described in the following. The vector θ h ∈ R n reg ×1 is the unknown vector linking accessible parameters and the hyperparameter under analysis. Finally, h ( p) is the regression error.
The model parameters θ h are computed on the data of the P training patients via linear least squares regression, i.e., minimizing the regression error h ( p) θ h = T −1 T log(h(1)), . . . , log(h(P)) T where ∈ R P×n reg is the matrix obtained by concatenating the P regresses vector φ of the patients.
The regression model can then be used to estimate suitable hyperparameters for a (possibly new) patient l aŝ Notably, (23) can be used also for real subjects. The regressor selection was performed via step-wise selection on the pool of possible accessible parameters {BW, CR avg , CF, i b avg }. The stepwise procedure was performed with the MATLAB function stepwisefit, starting from the The regressors choice of the procedure is in agreement with the finding of Toffanin et al. [23] for single-action MPC and with the univariate Pearson correlation coefficient of each accessible parameter and the optimal parameter in Table XII.
The parameters θ h obtained are reported as follows: