Multidimensional analysis of human intestinal fluid composition

The oral administration of solid dosage forms is the commonest method to achieve systemic therapy and relies on the drug’s solubility in human intestinal fluid (HIF), a key factor that influences bioavailability and biopharmaceutical classification. However, HIF is difficult to obtain and is known to be variable, which has led to the development of a range of simulated intestinal fluid (SIF) systems to determine drug solubility in vitro. In this study we have applied a novel multidimensional approach to analyse and characterise HIF composition using a published data set in both fasted and fed states with a view to refining the existing SIF approaches. The data set provided 152 and 172 measurements of five variables (total bile salt, phospholipid, total free fatty acid, cholesterol and pH) in time-dependent HIF samples from 20 volunteers in the fasted and fed state, respectively. The variable data sets for both fasted state and fed state are complex, do not follow normal distributions but the amphiphilic variable concentrations are correlated. When plotted 2-dimensionally a generally ellipsoid shaped data cloud with a positive slope is revealed with boundaries that enclose published fasted or fed HIF compositions. The data cloud also encloses the majority of fasted state and fed state SIF recipes and illustrates that the structured nature of design of experiment (DoE) approaches does not optimally cover the variable space and may examine media compositions that are not biorelevant. A principal component analysis in either fasted or fed state in combination with fitting an ellipsoid shape to enclose the data results in 8 points that capture over 95% of the compositional variability of HIF. The variable’s average rate of concentration change in both fasted state and fed state over a short time scale (10 min) is zero and a Euclidean analysis highlights differences between the fasted and fed states and among individual volunteers. The results indicate that a 9point DoE (8 + 1 central point) could be applied to investigate drug solubility in vitro and provide statistical solubility limits. In addition, a single point could provide a worst-case solubility measurement to define the lowest biopharmaceutical classification boundary or for use during drug development. This study has provided a novel description of HIF composition. The approach could be expanded in multiple ways by incorporation of further data sets to improve the statistical coverage or to cover specific patient groups (e.g., paediatric). Further development might also be possible to analyse information on the time dependent behaviour of HIF and to guide HIF sampling and analysis protocols.


Introduction
Oral drug administration is the favoured route based around patient preference and compliance with solid oral dosage forms the most common pharmaceutical product type. Since solids are not absorbed from the gastrointestinal tract oral systemic therapy requires the solid drug particles to dissolve prior to molecular passage of the drug across the gastrointestinal mucosa and transit through the portal venous system via the liver to the general circulation. Oral bioavailability is therefore controlled by a drug's solubility and dissolution in the gastrointestinal environment, permeability through the gastrointestinal mucosa and potential for enzymatic degradation in the gastrointestinal lumen, mucosa or liver [1]. Drug solubility in the gastrointestinal environment is therefore a key factor controlling bioavailability and has been incorporated into theoretical and practical biopharmaceutical concepts for example the Absorption Potential (AP) [2], Maximum Absorbable Dose (MAD) [3], Biopharmaceutical Classification System (BCS) [4] and Developability Classification System (DCS) [5,6]. This has led to a focus on gastrointestinal drug solubility during development [7][8][9] and a recognition that determination of the value either by computation or experimentation is a key stage. However, the current experimental methods available to determine solubility are not adequate [10] to permit in vivo prediction of average solubility and its variability from in vitro measurement and further development is warranted.
Drug solubility in an aqueous solution is controlled by the solution's composition. For example, the effect of solution pH on the solubility of ionisable drugs through application of the Henderson-Hasselbalch relationship is well established [1,11]. However, gastrointestinal fluid is a heterogeneous system with changes in pH and also the variable presence and concentration of additional components either endogenously excreted [12] or added via ingestion of food [13]. Gastrointestinal fluid is therefore a complex system containing for example electrolytes, bile salts, lipids and lipid digestion products, cholesterol, proteins, enzymes plus other components and will also vary depending upon the anatomical location (stomach vs small intestine vs colon) [14] and the prandial state of the individual [15]. In addition, the gastrointestinal tract is dynamic and the composition is continuously changing through a cycle of fasted and fed states superimposed upon the inherent biological variability of human subjects [16]. In order to assess the impact of the complex and variable composition of gastrointestinal fluids on drug solubility and therefore absorption, two general approaches have been adopted: the aspiration of human intestinal fluid (HIF) [12] for direct solubility measurements or the development of simulated intestinal fluids (SIF) based on the aspirated samples [17].
HIF aspiration requires oral intubation, determination of the catheter's anatomical location, followed by sample collection [15]. Several groups have developed sampling approaches [12,18,19] and determined the solubility of a range of drugs in both fasted [20] and fed [13] state HIF samples. These studies have demonstrated the variability of HIF composition between volunteers [21] and between intestinal regions [14,22]. HIF is cited as the most relevant system for the conduct of solubility studies and referred to as the "gold standard" [6,14], however, its use is restricted by the difficulty of sampling, the small fluid volumes obtained and the inherent variability.
In order to circumvent the issues associated with HIF, SIF have been developed employing physiologically relevant physicochemical conditions (e.g. pH and osmolality) and concentrations of HIF components (e.g. bile salt and phospholipid) [17,23]. In early studies, the SIF performance with respect to dissolution or solubility of poorly soluble drugs was compared directly with HIF [18,19] in order to refine and develop optimized SIF compositions. SIF media composition has subsequently been developed by multiple groups to improve performance with respect to HIF and several compositions are available [24][25][26]. This refinement continues and a recent publication proposed a further modification of fasted simulated intestinal fluid (FaSSIF) [27] based around an improved solubility determination of ten poorly soluble drugs when compared to literature solubility values in HIF. Multiple similar studies have also been conducted for fed state simulated intestinal fluid (FeSSIF) [17,23,26,28,29] with five variants present in the literature. A recent review covers the development and recipes of various types of simulated fluids [24].
As the available SIF aim to mimic the average composition of fasted or fed HIF, they cannot be used to assess the sensitivity of a drug's solubility to the high variability in HIF composition. To examine the impact of media composition on solubility, statistical investigations of media composition, guided by the available literature information [15] on HIF and SIF, have been performed. Typical media variables (fastedbile salt, lecithin, buffer, salt, pH, enzyme and fatty acid; fed as fasted plus mono-glyceride) and ranges were employed and a design of experiment (DoE) approach applied to study solubility using a range of poorly soluble drugs in either fasted or fed states. For the fasted state two approaches have been adopted: a fractional factorial design including 7 variables with 2 concentration levels requiring 66 experiments (per drug examined) [30] or a 5 variable design requiring 24 experiments [31]. In the fed state a single study has applied a D-optimal design with 8 variables and 2 concentration levels requiring 92 experiments [32]. Further studies have examined the application of smaller experiment number DoE systems [33,34] that combine fasted and fed states, culminating in a reduced scale 9 point DoE [35]. These studies indicated that solubility varied in a drug dependent manner by up to three orders of magnitude and drug behaviour could be broadly classified according to ionisation characteristics (acidic, basic and nonionisable). They also identified the key media variables influencing solubility and some variables e.g. enzyme and salt that had limited or no solubility impact. For acidic drugs, the key solubility driver was pH with limited impact from bile salt, lecithin and fatty acid. Basic and non-ionisable drugs displayed a different behaviour with pH, bile salt, lecithin and fatty acid contributing equally to solubility and in the fed state to a lesser extent mono-glyceride. The results also revealed Table 1 Fasted Data. significant interactions between the variables (e.g. pH and fatty acid) influencing solubility and the presence of drug specific behaviour in the systems. The approaches presented above for the in vitro determination of intestinal solubility and investigation and refinement of SIF recipes have a range of limitations. The development of SIF composition based on drug solubility in individual, multiple or pooled HIF samples, will be influenced by the already extensively covered issue of HIF variability, the inability to link solubility to media composition and the limitations of fluid aspiration. The DoE approach is too experimentally intensive for routine application and is likely to contain media systems that will have limited biorelevance, due to the statistical design (i.e., linking high and low variable concentrations (for example bile salt and phospholipid)). The use of a single point in vitro solubility measurement in either HIF or SIF is also likely to be limited. Due to the compositional variability, an individual HIF sample cannot be related to a likely population value, while a HIF pool (number of samples/donors dependent) aims to determine solubility in average conditions. However, a single solubility determination using either HIF or SIF will not provide any information on the solubility variability, which the DoE and related studies [32,36] have shown is inherently present in these systems.
Recently, a systematic study was published in which duodenal HIF samples from 20 subjects in fasted and fed states were collected and comprehensively analysed [21]. The results of this study highlight the inherent variability of HIF and provided time and prandial state based concentration data on multiple variables (pH, bile salt, phospholipid, cholesterol, free fatty acid and glycerides), some of which have been investigated in the DoE studies and various SIF recipes. In this paper, we have mathematically examined this data set using a novel multidimensional approach (each variable representing one dimension) to provide an improved understanding of the variation within the total data set, individual volunteer data sets, and to determine possible concentration correlations between media components. The aim was to explore the DoE limitation of linking high and low variable concentrations, to determine potential SIF recipes with improved relevance to HIF and to provide statistical boundaries for the SIF media that could link in vitro measurement to potential in vivo coverage.

Data
The data set analysed in this paper has been previously published [21] and readers should consult this paper for details of the clinical protocol, biochemical analysis methods, initial discussion of the results and comparison with previous literature studies. In brief, HIF was collected from 20 healthy volunteers (equal number of male and female subjects, age range 18-31, BMI range 19-25 kg/m 2 ) in both fasted and fed state. After an overnight fast (> 12 h), a nasogastric catheter was inserted and volunteers were administered 250 ml water prior to fasted state sampling every 10 min for a 90 min period. Subsequently, 400 ml Ensure Plus was ingested followed by 250 ml of water after 20 min to represent the fed state with sampling every 10 min for a 90 min period [21]. The samples were analysed for pH, phospholipids, cholesterol, bile salts, lipid content, pancreatic lipase, phospholipase A2 and nonspecific esterase activity. For the purposes of this analysis individual bile salt or fatty acid species concentrations were summed to provide a total concentration of bile salts and free fatty acids; the enzymatic values were not considered. The latter have been previously shown not to impact solubility during an equilibrium analysis [30,32] and were therefore excluded leaving five variables pH, total bile salt, phospholipid, free fatty acid and cholesterol per sample with volunteer and sample time as additional variable.

Statistics
Basic statistics and data set comparisons were performed using Prism8. Correlation statistics were performed using IBM SPSS Statistics v25. The Euclidean centre point in 5-dimensional space (each variable is one dimension) in each data set was calculated using FilemakerPro Advanced 17, this point does not assume a distribution function and is different from the mean and median variable values which are only calculated in one dimension. Two dimensional graphs were plotted using DataGraph v4.4. All the preceding software run on a Mac OS X 10.13.6 computer.

Multidimensional analysis
The aim of this analysis was to summarise the HIF composition data fasted state, statistical measures indicated by labels.
The centre of the ellipse is h k ( , ), which are the medians for each of the variables plotted. Both a and b are the radii of the ellipse in the × and y direction respectively. The values of a and b are calculated using the eigenvalues from the covariance matrix of x and y, and are equivalent to the variability in the direction of each of x and y around a fitted line, and have been scaled by 1.5 to cover approximately 90% of the distribution of points. The angle α is the angle of rotation, which was obtained using information about correlation via the eigenvectors. The parameters for the ellipse between each pair of variables is shown in Table 5  When looking at more than two variables, principal components analysis (PCA) was used in order to extract the information required to construct a set of points which span the space of the observations. The rotation matrix computed as part of the PCA was used to assess the slope of the ellipse axis in each of the dimensions, resulting in a rotation around the centre of the ellipse which was represented by the set of medians for each of the variables in the data set. The PCA variances indicate the variability in each of the rotated dimensions, and these were scaled by 1.5, as before, to provide sensible bounding limits. Using PCA means that some smaller subset of points could be used assuming that the variability explained by the first m components is large enough. A visual representation of these points is shown in Figs. 6 and 7, with further discussion in Section 3.

Initial data analysis
Based on the sampling protocol, the maximum number of HIF samples in the study in either fasted or fed state is 180, which would provide a total of 900 individual variable measurements. As sampling was not always feasible the number of samples achieved was 162 in the fasted state and 175 in the fed state; this has been classed as the full data set. As not all of these samples could be fully analysed, only 152 fasted state samples and 172 fed state samples contained a matched measurement of all five variables (i.e. same subject and sample time point with measurement of pH, total bile salt, phospholipid, free fatty acid and cholesterol); this has been classed as the matched data set. The basic statistics for the fasted state data sets (full and matched) are presented in Table 1 with fed state data in Table 2. No variables present a normal distribution as assessed by the Kolmogorov-Smirnov test and only two, phospholipid in the fasted and bile salt in the fed, have log normal distributions. This result, in combination with the calculated Skewness and Kurtosis values, indicates that these data sets are complex to describe and analyse using simple statistics see Fig. 1 and supplementary data Figure S1. Since the matched data set is a subset of the full data set, a non-parametric Mann-Whitney comparison was performed for each variable in the fasted and fed states. No significant difference (P < 0.05) between the full and matched data sets was found indicating that the selection and number reduction has not impacted on the data and all further analyses have been performed using the matched data sets only.
A comparison of the fasted and fed state data sets indicates a significant difference between the two states (supplementary data Figure  S1) for the five variables, which is expected based on published literature [15,37]. The matched data set provides a cloud of points containing five variables (concentration measurements of bile salt, phospholipid, free fatty acid and cholesterol and pH), or dimensions, which cannot be presented using a single 2-dimensional graph. To visualize the individual data points 2-dimensionally the bile salt concentration (a universally measured HIF property [12]) has been chosen as a constant x-axis with the four remaining variables on the y-axis. This produces a set of four graphs to represent all variables (Fig. 1) with a log/log scale chosen (with the exception of pH) so that the points sit within the axes. This permits easier data visualisation as a cloud and highlights differences and overlaps between the fasted and fed states (Fig. 1). In general the mean value tends towards the higher concentration end of the distribution with the median and Euclidean centre points visually more representative of the distribution centre. Although the difference between the mean and median may be exaggerated by the use of a log-log plot the values are presented in Table 1 and 2 for comparison. Fig. 1 also illustrates that to apply standard distribution statistics, for example mean with a standard deviation or median with a centile is not appropriate since in general the distributions have an ellipsoid shape with a slope or positive correlation (see section 3.2).

Amphiphilic variable correlations
Analysis of the data sets for correlations between the variables is presented in Table 3 for the fasted state and Table 4 for the fed state. In the fasted state data there is a positive correlation between all the amphiphilic variables (bile salt, phospholipid, free fatty acid and cholesterol) with mixed correlations for pH. This is also highlighted in the principal component analysis (Tables 5 and 6 [42]; Holmstock [43]; Kalantzi [37]; Lindhal [12]; Litou [44] ; Pedersen [19]; Persson [40]; Stappaerts [45]. • SIF literature reference data, straight label, Brinkmann-Trettenes [58]; Dressman [17]; Galia [23]; Jantratid [59]; Pedersen [18,19]; Sunesen [48]; Vertzoni [28]. ■ Design of Experiment sample points from Khadra [30]; □Design of Experiment sample points from Madsen [31]. NB no SIF recipes contain free fatty acid or cholesterol. [41]. The intra-state correlation of amphiphile concentrations is expected based on the fasted and fed correlation but it is not widely reported in the literature, although multiple studies (see section 3.3) have measured these values. This positive correlation of the amphiphilic variables indicates that the use of statistical design of experiment protocols [30][31][32][33][34] that link low and high concentration values in order to investigate a variables solubility contribution is likely to examine media compositions that are not biorelevant (see section 3.5).

Comparison with literature HIF analysis values
The fasted and fed state data sets have been plotted in Figs. 2 and 3 along with available literature data on measured HIF composition. Available literature HIF composition data in either the fasted (Clarysse [13]; Heikkila [42]; Holmstock [43]; Kalantzi [37]; Lindhal [12]; Litou [44]; Pedersen [19]; Persson [40]; Stappaerts [45]) or fed (Clarysse [13]; Holmstock [43]; Kalantzi [37]; Persson [40]; Stappaerts [45]) state has been plotted on the figures. Note that measurement of variables across these studies is not consistent, several studies used pooled samples from multiple and varying numbers of subjects and sampling protocols, for example pre-administration of water. Variability in the fed state related to the nature of the food intake will also be present in the literature data [46,47]. In general, even with these limitations, the literature results fit within the data cloud in this study, indicating that the data employed in this analysis is comparable to published literature, a more detailed comparison is available in the original publication [21]. However, the literature results are spread throughout the space, which correlates with the existing conclusion on the variable nature of HIF (both fasted and fed), implying that single HIF samples cannot be considered representative for the entire intestinal fluid compositional data space.

Comparison with literature SIF recipes
The literature contains multiple references and recipes for simulated media [24,30,32] that can be superimposed on the data sets, see Figs. 2 and 3. Note a comparison is only present where the simulated media recipe employs the variables measured in this study.
The fasted state comparison is only based around bile salt, phospholipid and pH, however the majority of recipes contain additional components, for example buffer (phosphate or tris/maleate), salt (sodium or potassium chloride), pancreatin and use physicochemical properties osmolality or surface tension as a characteristic. Some of these latter recipe variables have been determined to have no significant impact on equilibrium solubility [30][31][32] and are therefore not critical, assuming that specific issues, e.g. a common ion or poorly soluble salt effect, are not present for the drug under measurement.
The fasted state bile salt/phospholipid comparison (Fig. 2) indicates that the recipes are spread throughout the measured space with a general tendency to the centre and with a high phospholipid concentration. One system [29] is near the lower end of the distribution space and one of the recipes [28] matches the Euclidean centre. In some cases the difference between the median and Euclidean centres is minor [18,23,48]. Overall there is an excellent match between central distribution value of this study and the multiple SIF media recipes, although different experimental methods were applied to determine the recipe. The bile salt/pH comparison is similar however with a reduced variation between the recipes and several using a pH value that is close to the Euclidean centre.
The fed state comparison is based around bile salt, phospholipid, free fatty acid and pH and a similar issue with respect to recipe variables is evident [32]. For the fed recipes with bile salt and phospholipid there is a greater consistency between the researchers, which results in   [13]; Holmstock [43]; Kalantzi [37]; Persson [40]; Stappaerts [45]. (Kalantzi numbers indicate time (minutes) since the start of the fed state, see reference for full details.) • SIF literature reference data, straight label, Dressman [17]; Galia [23]; Jantratid [29]; Kleberg [26]; Vertzoni [28]. ■ Design of Experiment sample points from Zhou [32]. (NB no recipes contain cholesterol.) a lower coverage of the data space, a tendency to a high bile salt but low phospholipid or free fatty acid and for pH a large variation. With the exception of the bile salt/phospholipid ratio where some systems are close to the mean, in the majority of cases the recipe values do not reflect the central data space values.
In general, all recipes are within the measured data space and reflect the scientific design approaches adopted to determine the recipes based on measurements of individual or pooled HIF samples and the adjustment of SIF composition to provide a similar measured drug solubility to a HIF determination. However, the differences between the fed recipes and the central data set values indicate that further refinement of these systems is possible. The results also illustrate that the use of a single measurement point based on any of the recipes is not able to determine a drug solubility value that would be representative of the entire HIF data space [49].  [30] or Madsen [31]) is out with or on the extremes of the data distribution and therefore of limited biorelevance. In the fasted state the bile salt/phospholipid relationship is excellent but this is not the case for free fatty acid or pH. In the former the high values applied are too high, although the low values are appropriate. For pH the Khadra [30] values applied are too low with the Madsen [31] values providing a better coverage. In the fed state for all measurements the coverage is not optimal with DoE points obviously outside the measured HIF range or failing to include areas of the range [32]. This deficiency is in part due to the use of limited literature HIF measurements [15] at the time of study design [30,32].

Comparison with DoE investigations
In addition, Figs. 2 and 3 highlight the limitations of applying a structured DoE analysis based on a central point and a distribution with high and low values. This approach is not applicable were the variable distributions are not normal (section 3.1) and the variables are correlated (section 3.2), indicating that a tailored DoE approach is required to remove analysis points that are not biorelevant.

Multidimensional principal component analysis and ellipsoid fitting
The principal component analysis results and the calculated ellipse for each pair of variables are presented in Table 5 and 6 for the fasted and fed states respectively, with a graphical representation of the ellipses in Figs. 4 and 5. The PCA indicates that in both fasted and fed data sets pH is a minor contributor to the observed variation in intestinal fluid composition which is the opposite to the DoE solubility analysis where pH is a major variable [30][31][32] influencing solubility. Visually it can be seen from Figs. 4 and 5 that the ellipsoid approach provides an improved data coverage and description when compared with the DoE approach (Figs. 2 and 3) of individual variable high and low values (see section 3.2 and 3.5).
Pairs of data points on the ellipse can be calculated along with the percentage coverage of the data variation and these are presented in Tables 7, with a graphical representation in Figs. 6 and 7. In the fasted state, 4 points cover 65% of the data variation, which increases to 96% with 8 points. In the fed state, the coverage values are 68% and 98% respectively. The preparation of simulated media systems (either fasted  or fed state) using these 8 points would therefore provide a > 95% coverage of the potential intestinal compositions and therefore solubility values for a drug. If a lower level of coverage is required then a reduced number of data measurements is possible, but due to the mathematical analysis applied, this must be performed as pairs of points (see Table 7). The large experiment number DoE protocols [30,32] have been successively modified to minimise the experimental load by for example changing the design and reducing the number of variable concentrations investigated [33] with the smallest protocol consisting of nine experiments [35]. This approach reduces statistical resolution since only the most significant factors impacting solubility are identified and factor combination determinations are not possible. However, the solubility range can be successfully identified using only nine experimental points. The 8 points identified in this analysis could therefore be supplemented with a central point to produce a nine point DoE analysis, providing in vivo statistical limitations to the solubility range determined and with the possible ability to identify major variables impacting solubility. If a central point or point related to an existing SIF recipe was also employed then comparability with previous data sets might also be possible. This approach has caveats since existing SIF recipes do not contain all the HIF variables analysed in this study (see Figs. 2 and 3) along with the issue that solubility variability in simulated systems is inversely proportional to the number of variables included [36]. This will necessitate appropriate modification of the current SIF recipes, which inevitably limits retrospective comparison. In addition, the ellipsoid approach of determining the data cloud boundaries may suffer from inherent limitations since the focus is not on the HIF zone of maximum population/point density, which will be in the centre of the cloud. This can be mitigated as presented above by including a central point, however it is likely that although variable concentrations will be proportional across the space, solubility will not [32,36] and the potential for missing solubility minima or maxima cannot be ruled out.
The current SIF recipes are applied as single point measurements for screening solubility during drug development [38] and also for measuring solubility in order the determine DCS classification [5,6]. For poorly soluble drugs, where pH is not a major solubility driver, [30,32] solubility generally increases as the total amphiphile content and pH of a system increases [36]. This analysis indicates that a single point in either fasted or fed states with the lowest total amphiphile concentration and pH (dependent upon drug pKa considerations), for example point 1 in Fig. 6 or 7 might represent the lowest possible intestinal solubility. Since > 95% of the media variation (and therefore also solubility variation) in terms of amphiphile concentration and pH are higher than this point, a simulated media recipe applying these conditions would determine a solubility value which represents a lowest or worst case scenario for application in drug development studies or for DCS classification [5,6,36]. This will be a more robust approach than a single point solubility measurement in a simulated media recipe based on central distribution values. In this latter measurement, the solubility variation due to media changes for a drug will not be determined [30,32,49], which could result in errors. A similar limitation would also apply to sampled HIF solubility determination unless compositional analyses were performed to assess the variable concentrations with respect to the overall population distribution. In all cases, it would be prudent to check that no obvious drug specific solubility effects influenced the measurement.  Table 7.

Time dependent variation and subject variability analysis
In both fasted and fed state, the intestinal fluid samples were collected every 10 min for a 90 min period [21], which allows determination of correlations between variables with time and calculation of a variable's rate of change between sample points.
In contents [14,15,37]. It is of note that in the fed state no time based correlation was seen for bile salt and phospholipid although time based changes have been reported in the literature [37]; this may be due to the relatively short (90 min) sampling duration in this study. The results indicate that fed state HIF is time dependent [13,37] and may require more sophisticated analysis than applied in this study. A variables rate of change between samples are presented as violin plots for both fasted and fed state in Fig. 8. The striking result is that the average rate of change for all variables in either prandial state is very close to zero although some, for example bile salt in the fasted state or free fatty acid in the fed state exhibit a large variation. A similar analysis for this data was presented by the original investigators [21]. This finding indicates that on a short time scale (10 min) random concentration fluctuations take place but there are overall compositional limitations or boundaries (Fig. 1). The variability may be related to for example variations in inherent subject intestinal fluid composition or consistency, sampling and analysis variability [50], the presence of "water pockets" [51] or catheter or peristaltic movements between measurements. It should be noted that the results in Fig. 8 do not analyse longer term (> 10 min) time based trends (see above) for which a more sophisticated time based method will be required. Time based trends are present within the fed state for physiological reasons due to digestion and movement of materials within the tract during this process [37,52].
It is also possible to calculate a Euclidean distance between each time point and an average Euclidean distance per subject during the sampling period of either the fasted or fed states ( Figure S2). The results illustrate that the fasted state has a lower Euclidean value and distribution than the fed state indicating that at each measured time point the concentration differences between all the variables are smaller and more consistent. In the fed state the concentration differences are greater and the overall Euclidean distance value and distribution of results is larger. This is possibly expected based on fed state intestinal physiology with a larger volume of fluid, increased secretions and tract peristalsis [53]. This increase is applicable in nearly all subjects since the fed value is larger than the corresponding fasted.
Data for two subjects are presented for the fasted and fed states in Figs. 9 and 10 respectively. The lower Euclidean value of subject B in the fasted state is easily visualised since all the measurement values are closely grouped throughout the study period whereas for subject A there is a large movement of variable concentrations throughout the data space. In both cases the changes do not appear to be linked to a single variable but a combination of all variables. This difference is not as evident in the fed state possibly due to the higher and inherently more variable concentrations present. A similar comparable analysis is not available in the literature and the finding potentially has interesting implications. Bioavailability and bioequivalence studies are normally performed in the fasted state, the difference between subject A and B (Fig. 9) indicate a possible reason for subject specific variation during these studies. The behaviour also supports the suggestion (section 3.5) of applying a simulated media recipe that matches the low factor concentration and pH values to determine a worst case solubility value. Since the majority of individuals (approximately 90%) will have fluid compositions with factor concentrations and pH values greater than this point.
The time dependent compositional changes could be viewed as similar to the gastric transfer process [54] where a drug is exposed to a rapid change in physicochemical environment that can induce precipitation. Albeit in this case the change is not anatomical and based on the results will be of transient duration with a return to the original physicochemical conditions or even a higher solubility environment. Further analysis will be required to define the exact nature of these findings and changes, but if possible these should be factored into methods designed to explore supersaturation and precipitation [55].

Conclusions
This is an initial investigation applying a novel mathematical analysis treating HIF as a five dimensional fluid using a published HIF composition data set in both the fasted and fed states. None of the five variables analysed (total bile salt, phospholipid, free fatty acid, cholesterol and pH) follow simple statistical distributions. The amphiphilic variable (bile salt, phospholipid, free fatty acid and cholesterol) concentrations are positively correlated indicating that statistically designed simulated intestinal fluid compositions may require adjustment to remove non-biorelevant combinations. The analysis calculates for both fasted and fed states up to eight points that provide a > 95% statistical coverage of HIF compositional variation and could be used as a guide to link in vitro measurements to in vivo data. This indicates that a single point (depending upon the type of drug) could be applied as a worst-case scenario for in vitro estimation of BCS/DCS solubility classification. The analysis also determines that on a short time scale a variable's average rate of change in either prandial state is zero with the average Euclidean distance between points greater in the fed than in the fasted state. This result, in combination with the overall statistical analysis, indicates that there are boundaries to HIF variability, especially in the fasted state. Further time-based analysis is required to fully examine the data, relate variable concentrations to time over a longer period and investigate the physiology of the fed state where digestion and absorption are occurring.
The mathematical analysis applied is scalable and further data sets could be added to refine the statistical estimation, along with the application to different fed states [46], disease states [56] or specific patient age groups [57]. A consistent approach to the HIF components analysed between studies would assist, along with (from a drug solubility perspective) identification and ranking of the component's solubility impact using DoE or related statistical approaches. Linkage of the compositional space to the physical characteristics of HIF would also provide further insights. The calculated ellipsoid boundary points can guide the development of statistically relevant SIF recipes that explore intestinal solubility space. However, this requires further experimental studies to compare drug solubility performance with these systems against published results in this space. Overall, this approach has revealed interesting insights into HIF composition with multiple implications for future studies in this research area and indicates that this general approach might be a fruitful research avenue to explore for an increased understanding of this complex fluid.