Mathematical Modeling of Biofilm Structures Using COMSTAT Data

Mathematical modeling holds great potential for quantitatively describing biofilm growth in presence or absence of chemical agents used to limit or promote biofilm growth. In this paper, we describe a general mathematical/statistical framework that allows for the characterization of complex data in terms of few parameters and the capability to (i) compare different experiments and exposures to different agents, (ii) test different hypotheses regarding biofilm growth and interaction with different agents, and (iii) simulate arbitrary administrations of agents. The mathematical framework is divided to submodels characterizing biofilm, including new models characterizing live biofilm growth and dead cell accumulation; the interaction with agents inhibiting or stimulating growth; the kinetics of the agents. The statistical framework can take into account measurement and interexperiment variation. We demonstrate the application of (some of) the models using confocal microscopy data obtained using the computer program COMSTAT.


Introduction
Biofilms are structured communities of bacteria enclosed in an extracellular matrix composed of polysaccharides, proteins, and extracellular DNA adherent to a surface [1]. Unlike planktonic bacteria, biofilms exhibit differences in metabolism, antibiotic tolerance, and ability to evade the immune system, making infections due to biofilms difficult to treat [2]. Biofilms are a main cause of acute and chronic infections, including foreign-body infections, otitis media, and urinary tract infections.
When a population of microorganisms organized in a biofilm grows, it is likely to pass over different phases. The growth may follow a period of dormancy, if the environmental conditions before the beginning of the growth are not optimal. Eventually, cells start to divide and structure, and the biofilm grows into a period in which the overall rate of cell division prevails over that of their death. Under favorable conditions, the growth may be considered to be unlimited (hence exponential) for some time, but eventually [3] physiological and physical limits such as (i) exhaustion of available nutrients, (ii) accumulation of inhibitory metabolites or end products, and (iii) exhaustion of space intervene. As a result, the growth rate decreases and the colony reaches its maximum size. Repeated or continual exposure to these environmental or physiological stressors can then result in a decline in the biofilm size [4].
In vitro biofilms are often used in studies concerning therapy, dealing with the reactions of bacterial populations to various agents: drugs, for example, antibiotics, or mutagens [5][6][7][8]. Those are applied externally to the biofilm structures and change their environment or directly eliminate (kill) the bacteria or decrease their reproductive capacity.
A variety of mathematical models have been used to describe bacterial growth in investigation of dynamics in environment depending only on the activity of the bacteria [9,10]. Similarly, a number of models have been proposed to describe the action of different agents, in particular drugs and their interaction [11][12][13]. The objective of this paper is to obtain an integrated framework that provides models describing the different stages of biofilm growth, the action of different agents, and the simultaneous modeling of resulting kinetics for live and dead biofilm. The statistical issues associated with the use of these data, in particular the treatment of different sources of variability, are also addressed.
We demonstrate the use of the resulting general models using data obtained by means of confocal microscopy and COMSTAT [14]. COMSTAT takes the image stacks created by the confocal microscope as source data and produces up to ten image analysis features for quantification of biofilm structures which are output as one or more text files. The models we describe in this paper apply to univariate measurements: total biomass, area in a specific layer, average thickness, and volumes of microcolonies identified at the substratum (COMSTAT also obtains multivariate data, such as thickness distribution, which can be used to quantify the three-dimensional structures in the biofilm. The modeling of such data is the subject of current modeling investigation and will be reported in future communications.).

Mathematical Modeling.
Modeling of biofilm growth requires the specification of three components. The first, a function (⋅), describes the growth of the biofilm in absence of agents limiting or promoting growth. The second, a nonnegative function ℎ(⋅), describes the interaction with the agents. The third, (⋅), describes the temporal variation (kinetics) of the agents acting on the biofilm. The general model we consider expresses the rate of change of the biofilm as follows: where is time, ( ) is biofilm amount present in the system at time , 0 is the initial conditions or biofilm amount at = 0, and ( ) is the concentration of agent at time . The ± sign indicates that the model can describe either inhibition or stimulation.

Biofilm Growth.
A large number of models have been proposed to describe bacterial growth, for example, [15][16][17]. For the purpose of this paper, we only describe the simplest model for unlimited growth (exponential) and three semiempirical models for limited growth. Exponential biofilm growth assumes that the rate of growth is proportional to the amount of cells present in the system, and that there is no limitation to growth; thus where is the growth rate of the biofilm. The analytic solution of (2) is an exponential growth: with doubling time, , being equal to ln(2)/ .
In general, exponential growth can only describe the early stages of biofilm growth. A number of semiempirical models can be used to take into account the decrease in proliferation that results from limitations of nutrient supply or mechanical constraints or metabolites accumulation. We consider the Logistic, Gompertz, and Bertalanffy models. The Logistic [18] takes the following form: where max is the maximum biofilm level that can be reached.
The corresponding analytic solution is as follows: The Gompertz model [19][20][21] takes the following form: with analytic solution: Finally the Bertalanffy model [22] assumes that growth occurs proportionally to biofilm surface area, while biofilm loss is proportional to biofilm: where is the death rate for biofilm. The solution of (8) is also sigmoidal in shape and tends to an asymptote as time increases, where the birth and death term balance each other out. A general version of the Bertalanffy model takes the following form [22]: where 0 < < . The Logistic equation is a special case of this model with = 1, = 2.

Agents Interaction with
Biofilm. The simplest model describing the interaction of bacterial biofilm with an agent assumes that the action of the agent is proportional to the product agent and biofilm: where 1 now quantifies the interaction. For an inhibitory agent combining (10) with exponential growth (2) obtains According to this model if the agent concentration is kept at constant level / 1 , the growth rate is zero. That concentration is the minimal concentration of agent killing bacteria or BIC. The BIC is frequently used to characterize bacterial growth in planktonic, in vitro experiments, and Computational and Mathematical Methods in Medicine 3 clinical settings (e.g., [23]). Note that if the growth rate is not exponential, the BIC is not a constant but depends on the current amount of biofilm. For example, for the Logistic model, incorporating an inhibitory agent action takes the following form: and the BIC is given by the following equation: that shows how the BIC decreases as a function of ( ).
Similarly, for the Gompertz model, More complex models for the interaction agent/biofilm can be used, for example, threshold models, according to which the agent is effective if its concentration reaches a threshold, where the parameter 2 is the threshold; or models following saturation kinetics similar to what is used in, for example, pharmacodynamics [13], where now, for large concentrations of agent, the killing rate asymptotes to 1 ( ). In presence of two agents, 1 ( ) and 2 ( ), models (10) and (16) increase in complexity, as the agents can affect the killing rate in different ways. For example, the linear model (10) can be generalized as follows: where 3 is a killing rate for the agents' interaction. When saturation kinetics are present, as is the case for the single agent model (16), a number of possibilities arise, generating additive, synergistic, and antagonistic models often applied in pharmacodynamic, enzymology, or binding experiments [12]. For example, incorporating a model for competitive interaction between two agents yields where if > 0 the model shows additivity, and when = 0 it reduces to the competitive antagonism model and yields

Agent Kinetics.
A mathematical representation of the agents' kinetics is needed to incorporate their action on the biofilm. This in general does not present particular difficulties. Analytical solutions or sets of differential equations can be used to do so [24,25]. As an alternative one can use "model independent" representations of the data, such as smoothing splines [26], or, when the measurement error in the kinetics data is low an even simpler linear interpolant as we do in the examples reported below.

Modeling Dead Biofilm.
Confocal microscopy data allow the measurement of both live and dead cells (see Data section). Indicating by ( ) the dead biofilm present in the system at time , its rate of generation can be directly obtained from the growth equations reported above by identifying the loss or gain in live cells due to the presence of the agent. For the Logistic growth rate equation (4), this yields and, for the generalized Bertalanffy model equation (8), an inhibitory agent obtains For the Gompertz model, the growth equation does not lead directly to express a rate of cell loss since it only expresses a decrease of the growth rate inversely proportional to biofilm growth. A possibility is to express dead biofilm as follows: the difference between what would result from exponential growth and the actual biofilm level.

Modeling Post-Plateau Biofilm Decrease.
To account for effects leading to a post-plateau decline in the biofilm size [4], one can introduce a hypothetical endogenous variable, , that is reduced proportionally to the amount of biofilm present [27,28].
where is the rate of reduction. ( ) is arbitrarily set to 1 at time zero; it is positive as ( ) increases, but eventually it becomes negative (as it can be seen by the relationship ( ) = 1-∫ 0 ( ) ). The growth rate of biofilm takes the following form: where ( ( )) is given by (4) or (6). For example, for Logistic growth, The right side of equations (25) and (26) becomes negative as time progresses and leads ( ) to asymptote to zero ( ( ) asymptotes to a negative value). The main limitation of the model is that for any value of the biofilm always asymptotes to 0. To avoid this the model can be modified as follows: where and are production and elimination rates of the endogenous substance. Equations for dead cells are obtained as before.
We remark that especially in in vivo systems one can observe pre-or post-plateau appearance of cyclical growth, associated with seeding and dispersal [29,30]. To model these situations one could use a time-varying max that changes value after a pure time-delay or depending on a threshold biofilm value.
2.1.6. Modeling Dormancy. Biofilm growth can follow an initial period of dormancy that can be expressed easily using a (pure) lag time. The growth rate becomes where lag is the lag time.

Statistical Modeling and Model Selection.
Biofilm data usually consist of several measurements made on a number of growing "cells" on different occasions. The measurements present different levels of random variation: among measurements within a given cell (intracell variation), among cells (intercell variation), and interoccasion. A hierarchical nonlinear mixed effect model [31] can be used to represent such data. According to this model the th observation from the th cell and th experiment, , takes the following form: where is the corresponding sampling time, is a nonlinear function describing the relationship between time, parameters, and predicted response, and ℎ( , , ] ) describes the parameters values for each cell, which depend on , a fixed effect mean parameter vector, a vector of intercell random effect parameters, , and a vector of intercell random effect parameters, ] , with mean zero and a certain, typically multivariate normal, distribution with variance-covariance matrix Ω. The random effect indicates intracell variability (e.g., measurement error) typically assumed (multivariate) normally distributed with mean zero and covariance matrix Σ, which could depend on , (we omit details for simplicity) and an unknown parameter vector . Popular methods used to fit mixed effect models to data are implemented in the computer programs such as NONMEM [32] and NLME [33], which allow the computation of corresponding second-order statistics for the estimates, in particular the large sample variance-covariance matrix of the parameters , individual parameters as empirical-Bayes estimates, and corresponding predictions, conditional on the estimates for , Ω, Σ.
To select between different models one can use statistical model selection criteria together with the usual graphical displays based on predictions, observations, and residuals. Some statistical model selection criteria are the Hannan-Quinn (HQ) [34] and Akaike (AKA) [35]; these criteria penalize the likelihood of the model proportionally to the number of parameters in the model, thus penalizing the selection of models with 'too many' parameters. The HQ is the most conservative; it uses twice the log of the number of observations times the number of parameters as penalty, while the AKA uses twice the number of parameters.

Data.
Pseudomonas aeruginosa PAO1 tagged with green fluorescent protein (GFP) were studied in flow-chamber experiments described in [36]. In brief, biofilms were grown at 30 ∘ C in flow chambers. Each flow chamber was inoculated with 250 L of an overnight culture of PAO1 diluted to an OD 600 of 0.05 and left without flow for one hour. After one hour, flow was started with minimal media at a flow rate of 20 ml/h using a peristaltic pump (Watson Marlow 205 S). After cultivation for 24 or 72 hours, flow was stopped and minimal media were replaced with an antibiotic flask containing the desired concentration of either Meropenem (MEM) or Tobramycin (TOB) where administered as an intermittent bolus. This antibiotic flask is connected to bubble traps and the flow chambers containing the cultivated PAO1 biofilms. Flow was restarted and minimal media were pumped from the dilution flask to the antibiotic flask to the flow chambers at a constant rate calculated to mimic the elimination rate constant of the antibiotic. MEM and TOB were obtained from the pharmacy of the University of California San Francisco Medical Center [37]. Concentration-time profiles were based on previously described PK parameters of MEM and TOB from healthy volunteers and patients with cystic fibrosis [38,39]. The target MEM peak concentration based on human population values was computed to be 107.53 mg/L with a 1/2 = 0.893 h. The target TOB peak concentration, based on a dose of 10 mg/kg in a 70 kg adult, was 32.79 mg/L with an associated 1/2 = 2.75 h. Samples were analyzed by liquid chromatography-tandem mass spectrometry (LC-MS/MS) [40].
Microscopic observations of the flow cells were completed using a Leica TCS SP2 confocal laser scanning microscopy (CLSM) with an argon/krypton laser and detectors and filter sets for simultaneous monitoring of GFP (excitation, 488 nm; emission, 517 nm) for live cell staining and propidium iodide (excitation, 543 nm; emission, 565 nm) for dead cell staining. Images were acquired at approximately 1 m intervals in the direction down through the biofilm. Each channel of the flow chamber was randomly imaged at

Results
Biofilm Growth Models: Simulation. Figure 1(a) shows a plot of the Gompertz and Logistic models (see (7) and (5)) with parameters values 0 = 0.001, = 6, max = 1, and time in arbitrary units. Figure 1(b) shows their corresponding time derivative (i.e., the biofilm overall rate of change), and Figure 1(c) the biofilm doubling time. Doubling time is not a constant, as for the exponential model equation (3), but is instead expressed by the following relationships: for the Gompertz, and for the Logistic model. Note how the doubling time is only defined up to the time for which ( ) = max /2, where it tends to infinity. Figure 1(d) shows the BIC for the two models: note how the BIC is a decreasing function of ( ), because intrinsic growth rate decreases as ( ) reaches the asymptote max .
Simulated Data Example 1. Different growth models are often difficult to discriminate when fitted to data. Figure 2 shows data simulated using a Logistic (Figure 2(a)) or Gompertz (Figure 2 The Bertalanffy model equation (8) can describe rather different kinetics than the Logistic (and Gompertz) depending on the values of its parameters. In general one cannot reproduce data generated by generalized Bertalanffy process using either a Gompertz or Logistic equation.
Simulated Data Example 2. Figure 3 shows an example where data are simulated using the Bertalanffy model with = 4, = 2, = 0.25, = 1 (Figure 3(a)) or = 4, = 2, = 0.1, = 1 (Figure 3 Real Data Example 2. Figure 5 shows the result of the fits to biomass data collected in the same cells. The Logistic (dashed line) and Gompertz (solid line) fits estimate a similar value for max , the maximum growth level, and have slightly different shapes before asymptote toward max . The objective function values for the Logistic and Gompertz models were 585.86 versus 585.93, indicating that the models cannot be discriminated based on the fit. According to the Gompertz model, the maximum biofilm growth, ( max ), was 42.5 m 3 / m 2 with a birth rate ( ) of 0.051.3 1/h. Real Data Example 3. This demonstrates the use of a nonlinear mixed effect model on a complex data set that includes controls, single-dose administration of MEM at TOB at 24 hours (monoexponential decay), triple dose (monoexponential decay) of both drugs at 74 hours, and a combined singledose infusion of MEM and single dose (monoexponential) or TOB at 72 hours. Figure 6 shows the plot of the weighted residuals versus time for the fit to data; the solid lines show the R [41] function supsmu fit to the residuals. Note the overall lack of trends (a trend in the residuals would indicate a misfit of the model to the data), although there is a minor trend at the lowest and highest prediction values. Figure 7 shows plots of the observations (open circles) and empirical-Bayes predictions by the model (solid line) for a selection of the data sets. The data are selected based on the objective function value of the individual fit to each data set. Top to bottom, right to left they correspond to the 10% (best), 25%, 40%, 60%, 75%, and 90% (worst) fit of the model. Table 1 reports the parameter estimates, showing the partition of the variability of the parameter estimates. Figure 8 shows a simulation based on those parameter estimates: drug kinetics in Figure 8(a) and the corresponding biomass vsersus time plot in Figure 8 Real Data Example 4. Finally, as an example of a combined fit to live and dead cells Figure 9(a) shows the fit of the Logistic growth model incorporating drug effect according to (15) to the live biofilm average thickness data pooled from twelve experiments; Figure 9(b) shows the fit of to the dead cells; Figure 9(c) shows the pharmacokinetic profile imposed during in the experiments. Note how the model follows quite well the decrease in average thickness following the drug administration and tracks the corresponding increase of dead cells despite the large variability for both live and dead data measurements. The estimate of the killing rate ( 1 ) in this example is 0.00243 (1/h/ m).

Discussion
In this paper, we describe general mathematical models that can be used to quantify biofilm growth and interaction with different agents inhibiting or stimulating growth. The models are semiempirical in nature, in the sense that they do not include a detailed physiological description but allow the  quantification of biofilm growth and the interaction with growth-inducing or growth-reducing agents.
With regard to biofilm growth, we show that the commonly used BIC (the minimal drug concentration that inhibits growth) is not a constant when growth is limited, which is almost always the case with biofilms, but depends on the actual amount of biofilm and corresponding growth status.
We show how different growth models are often difficult to distinguish when fitted to data. This is due to the inherent flexibility of the models and also related to the sampling frequency and noise level present in the data. In the examples reported here, the sampling is relatively frequent. But in general, other experimental conditions might allow more frequent sampling and therefore the identification of more complex growth models and/or more complex models for the response to agents affecting biofilm growth.
The main advantage of the use of a mathematical model is that it can provide a characterization of complex data in terms of few parameters. As an example, Table 1 shows how a model with just five parameters characterizes seven distinct experiments. In addition, a nonlinear mixed effect model allows the identification not only of the parameters characterizing the growth and growth inhibition but also of the sources of variability present in a particular experimental situation. This is important because it allows the comparison of the quality of the data arising from different experimental conditions, and it suggests where experimental variation might need to be reduced. The main advantages of using a nonlinear mixed effect model are its statistical elegance and its efficiency. The main disadvantages are the many distributional assumptions involved in the formulation of the model and the computational burden involved in estimating the likelihood of the observations. A simple alternative, which takes advantage of the relatively abundance of data in in vitro experiments, is to fit data from each cell and experiment independently, obtaining the corresponding parameter estimates, which are then used to characterize the distribution of the parameters in the collection of experiments. This requires no specialized software and might be the most appealing method of choice in many situations.
The most important function of a mathematical model is the characterization of the process underlying data generation (rather than random variation associated with data collection). The models we describe represent a summary of models already present in the literature but also include potentially useful new models describing the kinetics of dead cells, together with models describing agents and agent interaction with growth. Taken together, they can allow the representation of a vast array of biofilm data.

Conflicts of Interest
All authors have no financial and personal relationships with other people or organizations that could inappropriately influence (bias) the work described in this article.