University of Birmingham Learning pharmacokinetic models for in vivo glucocorticoid activation

To understand trends in individual responses to medication, one can take a purely data-driven machine learning approach, or alternatively apply pharmacokinetics combined with mixed-effects statistical modelling. To take advantage of the predictive power of machine learning and the explanatory power of pharmacokinetics, we propose a latent variable mixture model for learning clusters of pharmacokinetic models demonstrated on a clinical data set investigating 11 β -hydroxysteroid dehydrogenase enzymes (11 β -HSD) activity in healthy adults. The proposed strategy automatically constructs different population models that are not based on prior knowledge or experimental design, but result naturally as mixture component models of the global latent variable mixture model. We study the parameter of the underly-ing multi-compartment ordinary differential equation model via identiﬁability analysis on the observable measurements, which reveals the model is structurally locally identiﬁable. Further approximation with a perturbation technique enables eﬃcient training of the proposed probabilistic latent variable mixture clustering technique using Estimation Maximization. The training on the clinical data results in 4 clusters reﬂecting the prednisone conversion rate over a period of 4 h based on venous blood samples taken at 20-min intervals. The learned clusters differ in prednisone absorption as well as prednisone/prednisolone conversion. In the discussion section we include a detailed investigation of the relationship of the phar- macokinetic parameters of the trained cluster models for possible or plausible physiological explanation and correlations analysis using additional phenotypic participant measurements. )


Introduction
Glucocorticoids are widely used, with prescriptions for up to 2.5% of the population for immunomodulatory and antiinflammatory effects in a number of disease states ( van Staa et al., 20 0 0 ). Endogenous glucocorticoid hormones (including cortisol and cortisone) are a vital part of normal metabolism and physiological function. They are produced by the adrenal cortex under the regulation of the HPA (hypothalamic-pituitary-adrenal) axis, in addition to enzymatic action in tissue. Cortisol increases blood sugar, functions as an immune system suppressant, decreases bone formation and supports the metabolism of fat, protein and carbohydrates. Unfortunately these hormones are also associated with adverse features including central obesity, proximal myopathy, osteoporosis, hypertension, insulin resistance, psychological effects and excessive skin changes, which serve to reflect their action in a range of metabolically active tissues. These effects regularly affect patients receiving exogenous glucocorticoid treatment but are particularly demonstrated in the rare condition of endogenous Cushing's syndrome which can occur as the result of tumours of the pituitary or adrenal gland or as a result of ectopic secretion of ACTH ( Newell-Price et al., 2006 ) and these are associated with excess mortality ( Clayton et al., 2016;Hassan-Smith et al., 2012 ). In recent years initiatives such as Horizon 2020 have been launched in order to address the public health challenges of our ageing population. In Europe, those aged > 65 years made up 17 million of the population in 1998, a number projected to rise to 25 million by 2035 ( Eurostat, 2008;Sakuma and Yamaguchi, 2013 ). Healthy life expectancy has not kept up with this increase in longevity, with a gap between life expectancy and disability free life expectancy in the UK of 9 years for women and 7 years for men at the age of 65 ( Self et al., 2012 ). As a result there has been much research focus on the role of so-called pre-receptor metabolism of glucocorticoids via the 11 β-hydroxysteroid dehydrogenase enzymes (11 β-HSD), in metabolic conditions (including obesity and diabetes) as well as those associated with adverse ageing (osteoporosis and sarcopenia) which are similar to those found in glucocorticoid excess ( Gathercole et al., 2013 ).
The two isozymes of 11 β-HSD regulate glucocorticoid action at a tissue level by shuttling them between active and inactive forms. The type 1 enzyme (11 β-HSD1) amplifies local tissue glucocorticoid levels by replacing the C11-keto group with a C11-hydroxyl group, converting endogenous cortisone to cortisol ( Gathercole et al., 2013 ). This activity is also critically important for exogenously administered synthetic steroids, such as prednisone, which is converted by 11 β-HSD1 to its active form (prednisolone). The type 2 enzyme 11 β-HSD2, on the other hand, catalyzes in vivo mainly 1 the opposite reaction to 11 β-HSD1, enhancing the inactivation of cortisol/prednisolone to cortisone/prednisone. The question as to whether altered metabolism due to more subtle changes related to ageing or individual genetic differences for example could result in changes in glucocorticoid responsive tissues and account for adverse tissue effects is a compelling one. Assessing the variation in 11 β-HSD1 activity between individuals, and within individuals as a result of ageing and lifestyle changes, and the resulting morbidity is a significant question in the field of metabolic research. 11 β-HSD1 is widely expressed in metabolically active tissues including liver, adipose, muscle, bone, skin and the central nervous system and the enzyme has been implicated in the pathogenesis of associated diseases ( Cooper et al., 1993;Tiganescu et al., 2013 ). Cell culture and animal models have suggested that 11 β-HSD1 is a major regulator of obesity and of the features of glucocorticoid excess as seen in Cushing's Syndrome ( Clayton et al., 2016;Markey et al., 2016;Morgan et al., 2016a;2016b;. Pharmaceutical companies have developed a number of selective inhibitors of 11 β-HSD1 and are assessing their therapeutic potential.
There remains a lack of consensus on the most appropriate biomarker to measure 11 β-HSD1 activity, which include urine steroid metabolite ratios after 24 h collections, tissue biopsies to measure activity and gene expression and dynamic tests such as the prednisolone generation test. There are limited data on the latter test, which involves administration of oral prednisone and serial blood tests for measurement of prednisone and prednisolone levels, representative of in vivo activation of this synthetic glucocorticoid. This information could inform future study protocols.
Prednisone has an identical affinity for 11 β-HSD1 as cortisone and the interconversion of oral prednisone to prednisolone has been used as a marker of predominantly hepatic 11 β-HSD1 activity (reflecting first pass metabolism) ( Gathercole et al., 2013 ). To date only a few studies have used the prednisone generation test to gain insight into the potential benefits for well-being, healthy ageing and personalized medicine: Tomlinson et al. (2007) investigated the effects of 11 β-HSD1 inhibition in different compartments with regard to adipose/fat tissue based on serum cortisol and prednisolone generation in 7 healthy male volunteers; Cooper et al. (2002) looked at in vitro (cell culture) activity (as 1 11 β-HSD2 is also catalyzing 11 β-HSD1 activation, but less efficient. opposed to in vivo activity as we investigate in this contribution) while Hundertmark et al. (1997) and Chen et al. (1990) looked at pharmacokinetics in 6 healthy males with IV administration of prednisolone as opposed to prednisone.
In vitro biochemical analysis of serum provides a method for assessing the activity levels of these enzymes. However it is unclear how informative serum activity data are regarding the dynamic processes occurring in vivo . Methods aiming to answer this question include in vitro-to-in vivo extrapolation (IVIVE) ( Cho et al., 2014 ) techniques, which have become an important tool for prediction of human effective dosages. However, as pointed out in Sager et al. (2015) , IVIVE in general requires considerably more experimental and in silico data than alternative static models. A more direct measure of enzymatic activity in vivo is to introduce a prescribed dose of the pharmacological cortisone analogue, prednisone, and to take time series data of the resulting blood concentrations, along with the active metabolite prednisolone. This paper tackles the problem of how to analyse a data set consisting of such time series from a group of healthy volunteers.
A classical pharmacokinetic approach to dealing with such datasets is to combine multi-compartment ordinary differential equation model with a mixed effects statistical model of interperson variation ( Beal and Sheiner, 1982;Owen and Fiedler-Kelly, 2014 ). The mean ("fixed effect") and standard deviation ("random effect") associated with the rate constants representing reactions between prednisone and prednisolone then provide measures of central tendency and variability in 11 β-HSD1/2 activity through the population under study. For such systems there is often limited access for inputs or perturbations and the mathematical models that are generated invariably include state variables with associated model parameters which are unknown and cannot be directly measured. These limitations can cause issues when attempting to infer or estimate unknown model parameters from sets of observations and this can severely hinder model validation. It is therefore highly desirable to have a formal approach to determine what additional inputs and/or measurements are necessary in order to reduce, or remove these limitations and permit the derivation of models that can be used for practical purposes with greater confidence. Structural identifiability arises in the inverse problem of inferring from the known, or assumed, properties of a system a suitable model structure and estimates for the corresponding rate constants and other parameters. The analysis considers the uniqueness (or otherwise) of the unknown model parameters from the input-output structure corresponding to proposed experiments to collect data for parameter estimation (under an assumption of the availability of perfect, noise-free observations). This is an important, but often overlooked, theoretical prerequisite to experiment design, system identification and parameter estimation, since estimates for unidentifiable parameters are effectively meaningless. If parameter estimates are to be used to inform about intervention or inhibition strategies, or other critical decisions, then it is essential that the parameters be uniquely identifiable. In this paper a structural identifiability analysis of a linear compartmental model developed to characterise prednisone kinetics is performed using the Laplace transform approach. This analysis demonstrates that from a structural perspective the model is structurally locally identifiable for the given system observations, thus providing more confidence in the results obtained for subsequent numerical parameter estimation using actual times series data for these observations.
A "data driven" approach to analysing such a data set would be to apply unsupervised clustering methods from the field of machine learning such as k-means ( Bishop, 1995;Lloyd, 1982 ), self organizing maps ( Kohonen, 1982 ) and Gaussian mixture models ( Feldman and Langberg, 2011 ), in which the data from each participant are considered as a vector from a high dimensional space. Clustering in the space of observed data either implicitly Table 1 Subject characteristics of the 12 healthy adults recruited from the local population. The last column shows the maximum likelihood cluster membership estimate (see Section 3 ).

ID
Age ( or explicitly assumes a certain metric or structure of the data. In this contribution we will pursue a hybrid multidisciplinary approach of model based clustering integrated with structural identifiability analysis for interpretation of parameter relations and perturbation theory to reduce the dimensionality of the parameter space. The core of our contribution is based on interpretable probabilistic inferential models, aiming at grouping individuals in the space of pharmacokinetic models based on observed data. Each group/cluster is therefore represented by a prototypical probabilistic model with a specific pharmacokinetic parameterisation. This way our proposed strategy automatically constructs different "population" models, that are therefore not defined based on prior knowledge or experimental design, but come out naturally as mixture component models of the global latent variable mixture model. In contrast to data driven clustering techniques, we can analyse the parameter relationships and investigate possible or plausible physiological explanation. The investigation of further phenotypic measurements of individuals more probable to be represented by the same cluster model might lead to new hypothesis of interesting biomarkers for future investigation and clinical studies. This contribution therefore reveals both the capabilities and limitations of pharmacokinetic modelling combined with parameter estimation and machine learning, demonstrated on a clinical data set for prednisone conversion as an example of its potential broader application to modelling of in vivo biochemical systems in heterogeneous populations.

Clinical data
The investigations were performed in 12 healthy adults (6 men and 6 women) recruited from the local population at the Queen Elizabeth Hospital Birmingham with subject characteristics summarized in Table 1 . Inclusion criteria included body mass index between 20 to 30 kg/m 2 , females in the follicular phase of their menstrual cycle, and post-menopausal subjects off estrogen replacement therapy. Exclusion criteria included pregnancy, significant past medical history (like diabetes mellitus), ischaemic heart disease, cerebrovascular disease, respiratory disease and epilepsy, use of drugs including glucocorticoids, beta-blockers, dopamine agonists and anticoagulants. A clinical study in Birmingham was carried out between October 2010 and March 2013. Participants arrived at the NIHR-Wellcome Trust Clinical Research Facility in a fasted state by 8:30 AM. Baseline blood tests were taken at approximately 9:00 AM and analysed for urea and electrolytes, lipids, glucose (Roche Modular System), insulin (colourimatric ELISA from Mercodia) in addition TSH and free T4 (Advia Centaur; Bayer Diagnostics) were sent. 10 mg of prednisone was then administered orally with additional venous blood samples taken at 20- min intervals over a period of 4 h with serum extracted and analysed for cortisol and cortisone serum concentrations by liquid chromatography-mass spectrometry as previously described ( Hassan-Smith et al., 2015 ). Observations including height, weight and blood pressure were recorded and body composition was assessed using Dual-energy X-ray absorptiometry (DXA) scannning (Hologic Discovery: version Apex 3.0, Hologic Inc).
Ethical Approval. The study was approved by the Coventry and Warwickshire Research Ethics Committee (REC reference no. 07/H1211/68) and the Scientific Committee of the NIHR-Wellcome Trust Clinical Research Facility at the Queen Elizabeth Hospital Birmingham.

Linear kinetics, three compartment model
The model ( Fig. 1 ) consists of three compartments, an unobserved stomach compartment S in which the prednisone formulation is initially deposited after oral ingestion, a compartment P representing blood concentration (nmol/L) of the inactive metabolite prednisone, and a compartment L representing blood concentration (nmol/L) of the active metabolite prednisolone. Reactions between the three compartments are assumed to have linear kinetics. This contribution focuses on the in-depth analysis of a linear kinetic example as a proof of concept, paving the way for analysis with increased complexity, such as non-linear models, in the future. Rate constants for the linear kinetics include: k abs for absorption of oral prednisone into the blood, k PL and k LP representing 11 β-HSD1 and 11 β-HSD2 activity converting the inactive metabolite to active form, and vice versa, and excretion constants k Pex and k Lex quantifying excretion from the blood of prednisone and prednisolone respectively. The enzyme activity is assumed to be significantly faster than either the absorption or excretion respectively, justified by the almost immediate presence in the blood of prednisolone observed experimentally.
With all reactions assumed to have linear kinetics, the mathematical model therefore takes the form: This model has six parameters, S 0 , k abs , k PL , k LP , k Pex and k Lex and can be written more compactly in the form: T , y includes the observable dimensions y 2 and y 3 (corresponding to P and L ) as selected by matrix C as given in Eq.
(2) and the matrix A is defined by: The formulation Eqs.
(2) -(5) explicitly defines model parameter and model output structure, which will be analysed in detail in the following section.

Structural identifiability analysis
In order to estimate the (unknown) model parameters from the data available it is necessary to include in the model output structure, which corresponds to the function of the model variables that is to be compared with the data. Before actually collecting experimental data it is necessary to test those model variables with respect to this output structure for uniqueness, since estimates for unidentifiable parameters are meaningless. Such a structural identifiability analysis ( Bellman and Å ström, 1970 ) assesses whether the observed model output contains enough information to determine all of the model parameters uniquely ( Jacquez, 1996 ), and relates only to the structure of the model and output. For linear systems there are many well-established techniques for performing a structural identifiability analysis (for further details, and details of nonlinear approaches, see the tutorial by Godfrey and DiStefano III (1987) and other works in the same volume and the book by Walter, 1982 ).
Here the uniqueness, of the unknown parameters in a general systems model is considered with respect to the outputs. Let p ∈ ⊂ R r denote a vector comprising the unknown parameters in the model, which belongs to an open set of admissible vectors ( Evans et al., 2002 ). To make the parameter dependence of the model outputs more explicit it is written y ( t , p ).
Two parameter vectors p , p ∈ are indistinguishable , written p ∼ p , if they give rise to identical outputs: For generic p ∈ , the parameter p i is locally identifiable if there is a neighbourhood, N , of p such that p ∈ N , p ∼ p implies that p i = p i .
In particular, if N = in the above definition then p i is globally identifiable , otherwise it is non-uniquely (locally) identifiable . Notice that, for a given output, a locally identifiable parameter can take any of a distinct (countable) set of values. If there does not exist a suitable neighbourhood N then p i is unidentifiable and, for a given output, can take an (uncountably) infinite set of values.
A system model is structurally globally identifiable (SGI) if all parameters are globally identifiable; it is structurally locally identifiable (SLI) if all parameters are locally identifiable and at least one is non-uniquely identifiable; and the model is structurally unidentifiable (SU) if at least one parameter is unidentifiable.
An established approach to identifiability analysis of linear systems is to take the Laplace transform, reducing the initial value problem to an algebraic one. Denoting the Laplace transform of μ as, the initial value problem (3) is transformed to, hence, Defining the characteristic polynomial, the solution in Laplace space for the observable components is, Expressing the solution in the form of rational functions yields: where the coefficients of the powers of s in the numerators and denominators of Eqs. (11) and (12) are termed the "moment invariants" for these input/output expressions and, in terms of the original model parameters, are given by: The moment invariants for the system are assumed to be measurable (known) through the observations, and are considered unique.
From analysis using Descartes' rule of signs it can readily be shown that Eq. (25) has three changes of signs in the coefficients, which means it has maximal three positive (real) roots. Since the negative polynomial f (−k abs ) has no change of sign it has no negative roots.
Furthermore, Eq. (17) can be written as In summary the structural identifiability analysis yields: (1) k abs is structurally locally identifiable (SLI) with up to 3 possible solutions.

Dimensional analysis
Because the model is linear, the dependent variables may be scaled arbitrarily; choosing the initial stomach concentration S 0 as the scaling factor and denoting dimensionless variables with primes we have, Taking as time-scale the inverse of the rate of absorption from the stomach, i.e. t = k −1 abs t , the problem can be written in terms of dimensionless variables as, with initial condition (S , P , L ) = (1 , 0 , 0) at time t = 0 . The groups η P = k Pex /k abs and η L = k Lex /k abs are the dimensionless excretion rates of prednisone and prednisolone respectively, ρ = k PL /k LP is the ratio between the rates of forward and backward conversion between the inactive and active metabolites, and = k abs /k LP the ratio between the rate of stomach absorption and rate of backward conversion from the active to inactive metabolite. The assumption that conversion between metabolites occurs on a faster time-scale than absorption and excretion implies that 1.

Quasi-steady approximation
While model (28) is linear and therefore can be readily solved via matrix exponentials, it is possible to exploit the small parameter to yield an even simpler system with two fewer free parameters. Eq. (28) for S decouples and has the analytic solution Substituting for S and adding the remaining equations yields the following equation for the dynamics of the total inactive and active metabolites in the blood: with initial condition P + L = 0 at t = 0 .
Eq. (29) is exact. The presence of the small parameter motivates seeking an approximate solution with a smaller number of parameters. Examining Eq. (28) and retaining only terms O (1/ ) yields the quasi-steady approximation, therefore, P + L ≈ (1 + ρ) P ≈ (1 + ρ) ρ −1 L . Intuitively, this relation can be interpreted as 11 β-HSD1 activity being sufficiently rapid that the ratio between active and inactive metabolites is approximately constant over the time-scales associated with absorption and excretion. Substituting into Eq. (29) we then have the approximate model for total metabolite dynamics, with initial condition P + L = 0 at t = 0 . The analytic solution is, The dimensionless metabolite concentrations can then be determined by substituting expression (30) into (32) to give The above can be derived formally as the leading order terms in a perturbation expansion Following re-dimensionalisation, the approximate solution of the system Eq. (1) can be expressed in terms of the original variables and parameters as, The following four-parameter approximate model μ( t ; w ) can then be defined for the observed variables P ≈ μ 2 , L ≈ μ 3 : where the parameter vector w = [ w 1 , w 2 , w 3 , w 4 ] is related to the physical parameters by, In Fig. 2 we see excellent agreement between the numerical solution of the six-parameter model Eq.
(1) computed with the Matlab function ode15s and the four parameters approximate solution given by Eqs. (36) and (37) , with parameter values k abs = 4 ,

Maximum likelihood estimation for clustering of pharmokokinetic models
The data for the absorption of prednisone and conversion to prednisolone (abbreviated by P and L respectively) exhibit large variations in healthy individuals. Therefore, the question arises as to whether there are groups of people with similar trajectories over the course of time and if those groups have common phenotypical characteristics.
We assume the generating process for the observed data can be reasonably well approximated by the simplified model Eqs. (38) and (39) parameterised by w . The aim is to find parameter vectors w k corresponding to different models explaining k groups of conversion behaviour over the course of time, such that prednisone ( P ) and prednisolone ( L ) measurements in the serum of subjects within a cluster are more similar compared to another cluster. We use Maximum Likelihood Estimation of a Gaussian  Fig. 3 illustrates our machine learning model and assumptions on the data. The curves denote the cluster model concentrations of prednisone μ 2 ( t ; w k ) (red) and prednisolone μ 3 ( t ; w k ) (blue), whereas differently shaped markers represent individual measurements from participants for both compounds at different time points y j c (t j m ) respectively. The solid black line illustrates the normally distributed observations y j c=3 (200) centered at μ c=3 (200 , w k ) with variance σ 2 c=3 . Since measurements of different individuals are considered independent we write the likelihood of the data given parameters We maximize the log likelihood: We weight each observation to account for the potentially varying number of successful measurements o j c per participant. For the EM algorithm we assume that the latent Bernoulli random indicator variable Z j k is 1 if the data collection D j was generated from the model parameterised by w k and 0 otherwise. The complete log-likelihood is given by (48) Fig. 3. Illustration of the data, variables and assumptions to train our cluster algorithm. P and L represent the compartments for blood concentration of prednisone and prednisolone respectively. The markers show example data from subjects with ID 1, 3 and 10 (see Table 1 ).
The maximum likelihood estimate (MLE) is determined by marginalizing the likelihood of the observed data by iterating over 2 steps: E: Since the complete likelihood is not known we calculate the expected value of the log likelihood function with respect to the conditional distribution of Z given D given the current estimate of the parameters ( t ) : M: Find the parameters maximizing the following quantity:

Results
We performed experiments varying the number of clusters from 2 to 6 possible models to represent the time course of prednisone and prednisolone for the respective number of groups of subjects. For each number of clusters we use leave-one-out cross validation of the 12 people with 5 independent repetitions resulting in 60 clusterings for each experiment. The algorithm is always initialized with individual fits of randomly chosen training subjects corresponding to the number of clusters assumed. The resulting median log-likelihood Eq. (48) and 50% IQR versus the number of clusters is shown in the boxplot in Fig. 4 . Incrementing the number of clusters up to 4 increases the performance considerably, after that adding another cluster does not improve the clustering significantly. Based on this "elbow criterion" ( Ketchen and Shook, 1996 ) we further analyze 4 class clustering results in detail.
For the investigation of the models and parameters we first compute the probabilities P ij by counting how often two subjects i and j appeared together in the same cluster throughout the 60 independent runs, assuming 4 clusters. We define 1 − P i j as the pairwise distance and perform complete linkage clustering as shown in the dendrogram Fig. 5 . The cut-off value of 4 clusters is used as cluster assignment Z j k to initialize our EM procedure once more to find the final model parameters for detailed investigation. In this last experiment we initialize by a least squares individual group  fit based on 5 repetitions of random parameters and the resulting final models are shown in Fig. 6 .

Discussion
The final trained cluster models are very robust with respect to the random initialization in the training based on the given initial 4 cluster assignment of participants extracted from the dendrogram Fig. 5 . Therefore, only the average parameters cluster models are shown in Fig. 6 . Cluster 1 resembles the slowest absorption of prednisone and also the least growing concentration of prednisolone in the blood. The second slowest absorption rate is captured by cluster 2. People assigned to that cluster reach higher levels of prednisolone. People in the fourth cluster reach higher concentrations of prednisolone 20 min after administration of prednisone and after 100 min it starts to decrease. The third cluster containing only subject ID7 resembles an outlier from the data set.

Table 2
Relationship of the original parameters for the 4 cluster models shown in Fig. 6 . We show means and standard deviations based on 5 random initializations.  ( Table 1 ).
The conversion pattern of this subject is very different compared to all other individuals in this data set.
With the knowledge about the functional relationships of the parameters from the identifiability analysis we can investigate the trained parameters and investigate possible explanations for the behaviour. In order to view the relationship for the original parameters ( Fig. 1 ) we solve the linear equation system Eqs. (40) -(43) for each of the cluster models. The mean and standard deviation of those relationships for each cluster based on the 5 random initializations is shown in Table 2 . Very small standard variations quantify the above statement that the final cluster models are very robust with respect to random initialization. The excretions of prednisolone k Lex and prednisone k Pex exhibit a linear antiproportional relationship: the more prednisone is excreted the lower the rate for prednisolone. Only a small interval of k Pex values is possibly dependent on the value of b where k Lex is larger (see Fig. 7 ). The clusters C 1, C 2, C 4 to C 3 exhibit increasing absorption rate S 0 · k abs of prednisone into the blood as can be seen from Table 2 . Prednisone is converted to prednisolone by a factor of 5 times faster than that for cluster 1. For the other clusters this factor is increasing.
After estimating the final cluster assignment we got back to our medical expert asking for additional information about the participants to investigate characteristics within the same cluster. Of course, due to the small sample size the following analysis is only exploratory and we cannot make strong claims, but the investigation might lead to new hypothesis and provide some evidence for interesting measurements for future clinical studies. Therefore, we compute the Area Under the Curve (AUC) of prednisolone conver- sion of each cluster model k : and weight it with the trained mixture component γ jk for each participant j resulting in an individual AUC value dependent on the membership in each cluster. We test the association between these AUC values and measurement of the additional clinical satellite data for each individual. First we observe that there are no strong correlations between the age or BMI in general, with a pvalue of .45 and .128 respectively. Cortisol on the other hand is correlated to the AUC and exhibits a p -value of .006, which is not very suprising since prednisolone is a synthetic derivative of cortisol processed by the same enzymes as modeled here. Since the hormonal activity of female and male population is different we furthermore investigate the correlation dividing the groups by gender. There is some indication that specific measurements could be interesting as subject for further investigation, for example we observe a p -value of .029 for the BMI within men, while it is .47 for the female participants. The investigation of the phenotypic measurements of individuals that are more probable to be represented by the same cluster model might therefore lead to new hypothesis of potentially interesting biomarkers for future investigation and clinical studies.

Conclusions and future work
The core of our contribution is based on interpretable probabilistic inferential models aiming at clustering pharmacokinetic models for the absorption of prednisone in the blood. The collection of all individual blood measurements is modelled as a probabilistic latent variable model with pharmacokinetic models playing the role of mixture components. Therefore, each group or cluster is represented by a prototypical probabilistic model with a specific pharmacokinetic parameterisation. This way our proposed strategy automatically constructs different "population" models, that are therefore not defined based on prior knowledge or experimental design, but come out naturally as mixture component models of the global latent variable mixture model. In contrast to solely data-driven clustering techniques, we can analyse the parameter relationships and investigate possible or plausible physiological explanation. The strategy is suitable for sparse measurements, which is especially beneficial if these are collected by an invasive procedure. Our approach is designed for time series measurements potentially taken at different time points and is demonstrated on a clinical data set investigating the in vivo glucocorticoid activation by 11 β-HSD1/2 activity in healthy adults.
The model was thoroughly studied by identifiability analysis and then approximated using the perturbation method. The latent variable mixture of pharmacokinetic models is trained by an Expectation Maximization strategy, which is a widely used efficient natural choice for the estimation of such latent variable models. We achieved robust results for 4 prototypical cluster models resembling the prednisone/prednisolone concentration in the blood over the course of 240 min after admission of the drug for the 12 subjects in the data set. We observed a weak correlation of the AUC of prednisolone concentration in the blood with respect to the cluster models and the BMI of male suspects, which does not seem to be immanent for the female participants. The investigation of further phenotypic measurements of individuals more probable to be represented by the same cluster model might lead to new hypothesis of interesting biomarkers for future investigation and clinical studies. With the availability of more data in the future the approach can be extended to non-linear pharmacokinetic models, while this contribution serves as a proof of concept. Our proposal emphasise the potential of exploratory analysis of partially observed time series data by combining pharmacokinetics, structural identifiability analysis, dimensional analysis/perturbation theory with machine learning.