Practical identifiability of mathematical models of biomedical processes

The paper is devoted to a numerical study of the uniqueness and stability of problems of determining the parameters of dynamical systems arising in pharmacokinetics, immunology, epidemiology, sociology, etc. by incomplete measurements of certain states of the system at fixed time. Significance of parameters difficult to measure is very high in many areas, as their definition will allow physicians and doctors to make an effective treatment plan and to select the optimal set of medicines. Due to the fact that the problems under consideration are ill-posed, it is necessary to investigate the degree of ill-posedness before its numerical solution. One of the most effective ways is to study the practical identifiability of systems of nonlinear ordinary differential equations that will allow us to establish a set of identifiable parameters for further numerical solution of inverse problems. The paper presents methods for investigating practical identifiability: the Monte Carlo method, the matrix correlation method, the confidence intervals method and the sensitivity based method. There is presented two mathematical models of the pharmacokinetics of the C-peptide and mathematical model of the spread of the COVID — 19 epidemic. The identifiability investigation will allow us to construct a regularized unique solution of the inverse problem.


Introduction
The use of medicine has become a part of our daily routine. Therefore, their properties must be improved, and the medicine themselves must be as safe as possible. Some medicines are not completely absorbed by the human body, only half or even less, and the rest is excreted from the body or is lost in it, causing harm to it. Therefore, the question arises: how to create a medicine in tablets that would be absolutely useful and no frills? To create such a drug, it is necessary to know all the properties of the medicine, which is often very difficult. Thousands of experiments need to be carried out and requires large financial costs. For these reasons, mathematical modeling of the distribution of drugs within the body can help with it. Mathematical models of pharmacokinetics are based on the mass balance law and are often described by systems of ordinary differential equations (ODE). However, the coefficients of these systems are not always known, since they characterize the parameters of drug absorption in different "chambers" of the body, the rate of drug transitions between human organs, which is often difficult to measure. Identifiability analysis is the first step towards identifying unknown parameters. To get acquainted with this process, we write down the dynamic system in general form: Here x(t) -n-dimensional vector of variables, y(t) -k-dimensional function of experimental data, u(t) -input data function, θ -s-dimensional vector of parameters, t -time, x(t) -t time derivative of x(t), f and h -functions.
The question often arises as to whether the only set of parameters θ = [θ 1 , θ 2 , . . . , θ s ] satisfying the experimental data. On the other hand, from the available measured data, it is impossible to determine the set of parameters θ characterizing the process under consideration. Then the identifiability analysis can determine the uniqueness of the desired set of parameters from the input and output data.
The article presents methods for studying the inverse problem (1)-(2) for identifiability, such as the Monte Carlo method, the matrix correlation method and the confidence interval method. Using the example of two mathematical models of the pharmacokinetics of C-peptide of the form (1), the above methods were used to determine unidentifiable parameters. The results of numerical experiments and their analysis are presented.

Literature review
The study of identifiability of models was developed in the 80s of the XX century. The first publications on identifiability include works [1,2,3], the basic concepts and definitions were introduced in [4], and the most important first results were obtained in articles [5]- [8]. Among scientists from the CIS, the following authors can be noted: Shcherbak V.F. [9], Levakov A.A.
Let's introduce the concept of identifiability. A dynamical system specified by equations (1)-(2) is called identifiable if the set of parameters θ can be determined uniquely or in a finite way from the input data u(t) and measurement data y(t), otherwise the system is called unidentifiable.
The concept of identifiability is divided into two parts: structural identifiability and practical identifiability. Structural identifiability is a well-defined and binary concept. Practical identifiability is less clear and can be defined as the ability to estimate a given set of parameters with an accuracy that is considered satisfactory in the context of the study. A very clear explanation of the differences between these concepts is given in [28]: "the structural identification is based on the structure of the model and the available measured data, and gives an indication of the maximum amount of information that can be obtained from a theoretical experiment. On the other hand, practical identifiability depends not only on the structure of the model, but also on the experimental conditions". Therefore, even if structural identifiability is defined, then practical identifiability may not be valid.
In practical identifiability, the measurement error ε(t) is added to equation (2): Here ε(t) has a mean square value of 0 and a variance equal to σ 2 (t). Then, to determine the practical identifiability, a number of methods are used, such as the Monte Carlo method, the sample multiplication method, the matrix correlation method based on the Fisher information matrix, the confidence interval method.

Results and discussion of mathematical models of the pharmacokinetics of the C-peptide
The analysis of practical identifiability and determination of confidence intervals for unknown parameters was carried out using the PottersWheel software package. PottersWheel has been designed to provide an intuitive yet powerful framework for simulations based on dynamic systems data. Its key functionality is multi-experiment fitting, where multiple experimental datasets from different laboratory conditions are set simultaneously to improve the estimate of unknown model parameters and validate the model. New experiments can be developed interactively. Models are created from either program text or using the visual model designer. Dynamically generated and compiled C-files provide fast modeling and procedure fitting. Each function can be accessed using a graphical user interface or through the command line, which allows batch processing in custom Matlab scripts. PottersWheel is a Matlab toolbox of 250,000 lines of Matlab code and the C-programming language. Detailed installation descriptions and introductory videos can be found at www.potterswheel.de.
The results of numerical experiments are presented for a simplified and expanded two-chamber kinetic model of C-peptide.
Let us investigate the practical identifiability of a simplified two-chamber kinetic model of C-peptide with 3 parameters k 01 , k 12 and k 21 [29]: measurements in which are specified for the first chamber. Here, chamber 1 represents blood plasma and chamber 2 represents tissue. Accordingly, cp 1 (t) is the concentration of C-peptide in the blood, cp 2 (t) is the concentration of C-peptide in the tissues, SR(t) is pancreatic secretion (for the sake of simplicity, it is equal to zero), V C is the volume of distribution of chamber 1, initial cp 10 = 1500. Parameters k 21 and k 12 denote the transfer rate between two chambers, k 01 is the loss from chamber 1.
For each of the parameters and the measured concentration of the first chamber, confidence intervals were derived ( Fig. 5 -Fig. 8).
As shown in the graphs ( Fig. 5 -Fig. 8), the profiles of the likelihood function of the parameters k 01 , k 21 and k 12 reach the upper confidence intervals (relative threshold value in the form of a blue dashed line), which means that these parameters are practically identifiable.
For each of the parameters, the highest and lowest threshold values that the parameters can take were set, the ranges of values were taken in the region from 0.01 to 0.1. The setting of parameters and computational settings, such as the maximum number of steps, the relative growth χ 2 , are indicated in Table 1.    For each parameter, both upper and lower confidence intervals were achieved, both for the likelihood function profile and for the confidence interval based on the Hessian matrix (Table 2.  (Table 3), it can be noted that the parameters also turned out to be practically identified, confidence levels for intervals for likelihood profiles were achieved. Note that the values of the confidence intervals based on the Hessian matrix for different ranges of values turned out to be the same (Table 4). Considering the pairwise correlation of parameters ( Fig. 9), it is possible to determine a strong positive correlation for the parameters k 21 and k 12 , almost equal to 1, which means that these parameters are practically indistinguishable and their calculation should not be carried out separately.

Results and discussion of mathematical model of the spread of the COVID-19 epidemic
The sensitivity-based identifiability analysis of mathematical models of the spread of the COVID-19 epidemic shows that contagious disease parameter is identifiable by using daily confirmed, critical and recovered cases for 205 days. On the other hand, the predicted proportion of critical cases requiring a ventilator, as well as the mortality rate, is determined much less consistently. It was shown that in order to build a more realistic forecast, it is necessary to add additional information about the process (for example, about the number of daily hospitalized cases) and use quasi-solution [?]. It turn out that a compartment model of COVID-19 outbreak consisting of seven ordinary differential equations, describes the main trend in the spread of coronavirus infection and sensitive to the peaks of confirmed cases, but does not qualitatively describe small statistics (the number of critical cases on day t k , death cases as well).
A feature of the mathematical models of the spread of COVID-19 developed to date is to analyze the behavior of the asymptomatic course of the disease and the effect of the incubation period of the disease on the nature of the epidemiological situation in the regions. One of the main approaches to modeling the spread of epidemics is the Chamber approach (top-down Often, the parameters of the transition between groups are unknown or set in a wide range (for example, the incubation period of the disease is 2-14 days according to WHO estimates), which complicates the analysis of the model and the construction of qualitative scenarios for the development of the disease.
Consider the system of equations of the SEIR-HCD model, in which the population is divided into 7 groups [32,33,39]: Here S(t) is a susceptible individual at time t, E(t) is an infected non-infectious (nontransmitting virus), I(t) is an infected infectious (transmitting virus), A(t) is a patient without symptoms, H(t) -seriously ill, C(t) -critically ill (requiring a ventilator connection), R(t)recovered, D(t) -deceased. The averaged parameters of model (4) for the Novosibirsk region are given in Table 6 [35], [38], [37]. Table 5. Averaged parameters used in model 6 for Novosibirsk region. Let's assume that additional measurements are known about the number of detected, critical and fatal cases at fixed points in time: where b(t) ∈ [0, 1] -proportion of asymptomatic patients in identified cases, f k -number of identified patients per day k, C k -number of critically ill patients per day k.
Let the parameters q = (α E , α I , β, ε HC , µ, E 0 ) T ∈ R 6 be unknown. Let us analyze the semirelative sensitivity of the parameter vector q to measurements (6) for the mathematical model (4). To do this, construct ∂f i (t) ∂q k q * k , (f i ) = (I, C, D), i = 1, 2, 3, and analyze the value ∂f i (t) ∂q k q * k 2 (see Table 6). The quality of determining the parameters β, ε CH , and µ when solving the inverse problem is practically independent of the available measurements of the number of infected I(t), in contrast, for example, to the coefficients α E , α I , E 0 , which are more sensitive to these data. Table 6. Semi-relative sensitivities of various states of the model 6 to parameters, sorted in descending order.  Figures 5-8 show the graphs of the time variation of the sensitivity function ∂f i (t) ∂q k q * k depending on the variable parameter. The more variable the parameter is in dynamics, the higher the sensitivity to these measurements, which means that it will be determined more stably. Figure 9 shows the results of the analysis of the sensitivity of the parameters of the mathematical model (4) at various iterations of the orthogonal algorithm (the description of the algorithm is given in [33]. The horizontal axis represents the iterations of the orthogonal algorithm, the number of which is 1 less than the dimension of the vector of unknown parameters (the number of columns of the sensitivity matrix), and the vertical axis is the values of the perpendicular norms for the resulting transformations of the sensitivity matrices. It was shown that the most identifiable parameters were also the parameters of infection between asymptomatic and susceptible groups of the population α E , infection between the infected and susceptible population, which is associated with the contagiousness of the virus and social factors α I , as well as the initial value of infected or in the incubation period of individuals E 0 . The sequence of parameters obtained using the sensitivity analysis method for the mathematical model (4), located from the most to the least sensitive parameter: α E , α I , β, ε HC , µ, E 0 .    After analyzing identifiability, we can conclude that the least sensitive (more identifiable) parameters of the model to variations in data (errors) are α E , E 0 , and α I , in other words, these parameters are more stably determined when solving the inverse problem (4), (6). The most sensitive (less identifiable) to errors in measurements are the parameters ε HC , µ and β (with the lowest values of the perpendicular norms in the sensitivity matrix), that is, it is necessary to develop a regularization algorithm that allows you to control the quality of determining the sensitive parameters.

Conclusion
In the course of the work, methods of practical identifiability of parameters for a simplified twochamber kinetic model of C-peptide with three parameters were considered and applied ( [30]). All three parameters were identified as structurally identifiable under two different conditions of the range of values of the sought parameters. In the first case, when the range of values was around 100% of the true value of the parameter, then for the parameters k 21 and k 12 the lower confidence interval fluctuates around 45% of the desired value of the parameter, while the upper intervals are 30%. Moreover, the parameter k 01 , on the contrary, has a lower interval  Figure 9. The values of the norms of perpendiculars (vertical axis) for each parameter (different color) at different iterations (horizontal axis) of sensitivity-based orthogonal algorithm for the mathematical model (4). of about 20%, and an upper interval of 60% of the true value. In the second case, when the range of values is in the threshold from 0.00001 to 100000, then for all three parameters the lower confidence interval fluctuates around 35%, up to the upper confidence interval for parameters k 21 and k 12 increased to 130%, and parameter k 01 decreased to 40%. In addition, the correlation matrix showed that the parameters defined as structurally identifiable, are similar and have coefficients close to one. Such a strong positive correlation between the two parameters means that both parameters are highly dependent on each other and cannot be calculated separately. Further study of the methods and application of software packages will allow a more detailed description of this or that parameter and determine its identifiability for more complex models. An analysis of the sensitivity of the mathematical model of the spread of coronavirus infection COVID-19 showed that the parameter of the contagiousness of the virus is steadily determined by the number of sick, critical and cured cases detected daily. On the other hand, the projected proportion of hospitalized cases in critical condition and requiring a ventilator, as well as the mortality rate, are determined much less consistently. It has been shown that in order to build a more realistic forecast, it is necessary to add additional information about the process (for example, about the number of daily hospitalized cases).