Bioassay analysis using R

We describe an add-on package for the language and environment R which allows simultaneous fitting of several non-linear regression models. The focus is on analysis of dose response curves, but the functionality is applicable to arbitrary non-linear regression models. Features of the package is illustrated in examples.


Introduction
Bioassays are experiments with biologically active compounds.In herbicide selectivity studies it is common to run suites of bioassays with dose response curves for different plant species and/or different herbicide preparations.The same principles apply to the study of compounds in toxicology and pharmacology.In herbicide research and development, potencies of compounds usually are compared at some a priori response levels, say 50% reduction (ED50) in biomass or other response variables.For example, ED10 denotes 10% effect, that is 90% of the untreated control.To describe herbicide selectivity we compare tolerance of crops, say ED10, and sensitivity of weeds, say ED90.These ways of comparing compound potencies and/or species sensitivity also are widely used in ecotoxicology of xenobiotics.
Therefore, it is relevant to develop software which is capable of carrying out simultaneous non-linear regression analysis on several bioassays.In this paper we will demonstrate the use of the software package drc for analysis of multiple dose response curves.The functions in drc provide a convenient means for specifying models, controlling the minimization and retrieving relevant results, including comparisons of parameters of interest.
The package drc is an add-on package for the language and environment R (R Development Core Team, 2005) which is open source and freely available (see http://www.R-project.org).R is an implementation of the language S.
The approach taken is to illustrate the functionality through several examples, where the function calls in R and the resulting output are interwoven into explanatory text.In this manner we present applications of the main functions contained in drc.We do not give an exhaustive description of the numerous facilities in drc.However, we wish to emphasize that various non-linear regression models can be fit using drc.
Sections 2 introduces the relevant statistical models and some aspects related to the parameters in the models.An overview of the package drc is given in section 3.In sections 4, 5, 6 and 7 we illustrate some of the features and functionality of drc for various aspects of fitting non-linear regression models with emphasis on bioassay analysis.

Non-linear regression models for bioassay data
We consider a suite of n dose response curves and we assume that for each dose response curve the response follows a non-linear curve specified by the function f , which is known apart from a parameter vector that may be different for different curves.Thus, the model for the ith dose response curve is yij = f (xij, αi) + εij j = 1, . . ., mi, i = 1, . . ., n, where xij denotes the jth dose value in the ith dose response curve and yij is the resulting response value, αi is the unknown parameter vector for dose response curve i and εij is the measurement error for the response yij.The measurement errors ε11, . . ., εnm n are assumed mutually independent and normally distributed N (0, σ 2 ).In particular, all observations have the same variance (variance homogeneity).
The parameters are estimated using non-linear least squares which amounts to minimizing the sum of squares with respect to the parameters (α1, . . ., αn).

Commonly used models
Two models for sigmoidal dose response curves are widely used: The logistic and Weibull models.We will briefly discuss these models.
The four-parameter logistic function is given by the formula with 4 parameters b, c, d, e.The parameter e is also denoted ED50 and it is the dose producing a response half-way between the upper limit, d, and lower limit, c.The parameter b denotes the relative slope around e. The interpretation of the parameters is discussed by Streibig et al. (1993).The logistic function is symmetric around e. The three-parameter logistic model with the lower limit equal to 0 has the form: and the five-parameter logistic model (Finney, 1979) is given by the formula Letting the parameter f be equal to 1 in (2) yields the four-parameter logistic model (1).
The four-parameter Weibull model is given by the formula The parameters c and d are the lower and upper limits, as for four-parameter logistic model, b is the relative slope around e, and the e parameter is the logarithm of the inflection point.The Weibull model is not symmetric around any point.The three-parameter Weibull model with the lower limit equal to 0 is then Besides the Weibull model and the logistic model, Brain-Cousens' model (Brain & Cousens, 1989) is used in situations where hormesis is present: For small doses the herbicide has the adverse effect resulting in response values above the level of the control group (Calabrese & Baldwin, 2001, 2003).Brain-Cousens model is obtained by modifying the four-parameter logistic model adding the linear term f x in the numerator.Again we could also consider the modification where the lower limit is set equal to 0.
Notice that the b and e parameters are different in (1), ( 2), ( 3) and (4), respectively.In this paper we assume that the response is a decreasing function of the dose (from maximum response to lower limit), corresponding to positive b, but the models also apply to situation where the response is increasing with the dose.

Initial parameter values and self starter functions
Contrary to linear regression, estimation of parameters in non-linear regression requires the specification of initial parameter values.The choice of the values may influence on the convergence of the estimation algorithm, in the worst case yielding no convergence and in the best case convergence in few iterations.For many non-linear functions it may be possible to obtain sensible initial parameter values using the interpretation of the parameters.For the four-parameter logistic function this is done by using the maximum (initial value of d) and minimum (initial value of c) value of the ys and then making the transformation Now linear regression can be used to obtain initial values of b and e.The procedure described can be captured in a function which for a given data set returns starting values.We will refer to such a function as a self starter function.A similar approach can be used for the Weibull model and Brain-Cousens' model.For another example see section 7.

Comparison of parameters
Having several fitted dose response curves, it may be of interest to compare parameters across curves, for instance comparing lower limits for different curves.
Typically, in bioassay analysis, the main interest lies in comparing quantities that are functions of the parameters.The common approach has been to re-parameterize and refit the model for each function of interest (Schabenberger et al., 1999).This approach is relatively computer-intensive, requires some skill in manipulating mathematical expressions and moreover is vulnerable to lack of convergence of the non-linear least squares algorithm, as the re-parameterization could result in strongly correlated parameters depending upon the distribution of the responses between the lower and upper limits.Therefore we implement a single-model approach where a single model is fit and the delta method (van der Vaart, 1998, chap. 3), is used to calculate approximate standard errors for functions of the parameters.
The quantities effective dosage (ED) and selectivity index (SI) are commonly used to compare different herbicides.Both ED and SI are functions of the parameters: EDy is defined as the dose that yields a response which is (100-y)% of the maximal response d (a reduction of y%).For instance, EDy can be expressed by means of the parameters b and e in the four-parameter logistic model EDy = e(y/(100 − y)) 1/b .The Weibull model yields a similar formula.For Brain-Cousens' model there are no closedform solutions, but numerical methods can be applied; we use the bisection method.SI(x,y) is the ratio between EDx for one curve and EDy for another curves.

SI(x,y) = EDx/EDy
In section 5 we will illustrate the interpretation of ED and SI based on an example.

About the package
The drc package consists entirely of interpreted R lines.This paper describes version 0.8-1.The current version as well as future versions can be found at www.bioassay.dk.
The main function is multdrc which carries out the estimation of parameters and returns a model fit, an object of class 'drc'.The default optimisation method is a variant of the Newton algorithm, but this setting can be changed in multdrc.The functions described in subsection 2.1 are built-in in multdrc and available by specifying the fct argument.Table 1 provides a list of some of the built-in functions available.
The default value is the four-parameter logistic model, that is l4() (which need not be specified).It is possible to specify initial parameter values manually, but as it may be difficult to guess at values of the parameters for the less experienced researcher, the built-in functions in drc have self starter functions.User-defined functions with a user-defined self starter function can be specified (see section 7).
Once a model fit is obtained using multdrc, the following methods are available for objects of class 'drc', for extracting information In addition the function plotraw produces a plot of the observations, all curves in a single plot or a plot for each curve.The functions compParm, ED and SI provide comparisons of parameters, ED values and SI values, respectively.The functions ED and SI work with the built-in nonlinear functions.User-defined functions can also be made to work with ED and SI, but it requires that they provide additional formulas for the calculation of ED values and their standard errors.The function diagnostics supplies brief information about convergence of the estimation algorithm.

Fitting a single dose response curve
To get started we need to load the package drc.This is done using the library function > library(drc) 'drc' has been loaded.
Please cite R and 'drc' if used for a publication, for references type 'citation()' and 'citation('drc')'.
The data sets that we will use in this paper are all part of the package.In this section we use the data set ryegrass consisting of a single dose response curve.The first 5 lines (rows) of the data set are displayed by typing ryegrass[1:5,] > ryegrass [1:5, ] rootl conc 1 7.580000 0 2 8.000000 0 3 8.328571 0 4 7.250000 0 5 7.375000 0 The variable conc is the concentration of ferulic acid in mM and the variable rootl is the root length of perennial ryegrass (Inderjit et al., 2002).
As already mentioned the main function in drc for fitting dose response curves is multdrc which can be used to fit data from one or more dose response curves.By default a fourparameter logistic model is fitted to the data.In order to fit this model to the data set ryegrass we write The argument to the function multdrc is a data frame which is a collection of columns of the same length, usually forming a data set or part of a data set.The above R call, multdrc(ryegrass), produces no output.All relevant information of the fit of the model to the data is stored in the object modelex1.The package drc provides extractors for extracting various types of information from modelex1.We will now introduce some of these extractors.
The anova function can be used to obtain a lack-of-fit test, comparing the four-parameter logistic model to a one-way ANOVA model > anova(modelex1) The t-statistics and corresponding p-values are for testing the null hypotheses that the parameters are equal to 0 (not necessarily relevant hypotheses to consider).The estimate of the common variance parameter σ 2 is 0.27.A short version of the output only containing the parameter estimates is obtained typing coef(modelex1) (using the function coef).
In cases where the assumption of variance homogeneity is violated, a Box-Cox transformboth-sides approach may help.The optimal Box-Cox transformation is calculated and applied if the argument boxcox=TRUE (short: boxcox=T) is specified.For the above example the call to multdrc would become > modelex1.boxcox<-multdrc(ryegrass, boxcox = TRUE) > summary(modelex1.boxcox)Apparently, the optimal parameter λ for the Box-Cox transformation, which is 0.4, is significantly different from 1.00, that is no transformation.This means that the variances of the responses are not homogeneous.A residual plot (not shown) using the call plot(fitted(modelex1.boxcox), residuals(modelex1.boxcox))showed that the variance inhomogeneity had been removed by the transformation.The fit of the model to the data can be seen in Figure 1.

Simultaneous fitting: Summarising the results
The data set PestSci consists of 5 curves each with 7 doses and in 3 replications.The variables are CURVE, DOSE and SLOPE, containing the curve number, the dose values and the response values, respectively.The response is rate of change of Oxygen consumption of chloroplast membranes versus the dose of a herbicide.Relevant references are Streibig et al. (1999); Nielsen et al. (2004).To load the data set PestSci type data(PestSci).
To get an overview an initial plot of the data set is useful.The function plotraw plots the dose values against the response values with different plot symbols for the different dose response curves (Figure 2).The plot may help getting a first impression of the data, in particular with respect to the ranges of dose values and the range of response values.
> plotraw(SLOPE ~DOSE, CURVE, data = PestSci, ylab = "Rate of Oxygen Evolutions", + colour = TRUE) q q q q q q q q q q q q q q q q q q q q q DOSE Rate of Oxygen Evolutions We consider simultaneous fitting of the 5 dose response curves in PestSci, assuming each curve follows a four-parameter logistic curve.Furthermore, we assume the parameters differ among curves (in total 4•5=20 parameters).This model is fitted using the multdrc function The formula SLOPE~DOSE in the first position relates response to dose.The variable CURVE in the second position is the grouping variable, uniquely assigning each observation to a curve.The argument data=PestSci specifies the data frame where the variables CURVE, DOSE and SLOPE are found.From the summary it is seen that the lower limit of dose response curve no. 1 and no. 5 are negative which, strictly speaking, is not meaningful, but these two lower limits are not significantly different from zero (right-most column).Comparison of parameters between curves can be accomplished using the compParm function.
A plot of the original observations and the fitted dose response curves is obtained using the plot function.The resulting plot is displayed in Figure 3, showing reasonable agreement between observations and fitted curves.
> plot(modelex2, xlab = "Dose", ylab = "Rate of Oxygen Evolution", + conName = "Control", col = TRUE) q q q q q q q Dose Rate of Oxygen Evolution The call plot(modelex2, obs="none") produces a similar plot, but without the observations.Also the argument conName specifies the name of the tick mark for the dose level that can be considered as dose 0. The usual graphical parameters in R can also be supplied in the function call; for instance xlab and ylab to change the default labels of the axes (which are the names of the variables in the data frame).For more details on the plot function see ?plot.drc.
As already mentioned in Section 2, the package drc provides functions ED and SI to compute EDx values and selectivity indices SI(x,y), respectively, for the built-in models like the four-parameter logistic model.As an example we calculate estimates of the parameters ED10, ED50 and ED90 and their standard errors for all curves in PestSci.The output provides standard errors of the estimates and p-values for testing the null hypothesis that the indices are equal to 1.The use of SI is best illustrated by using the methods chemical companies use to quantify selectivity in their research and development of new herbicides.Since herbicides have an effect on any plant, be it crops or weeds, we sometimes have to accept a small decrease in the crop yield, for example a 10% decrease might be tolerated (ED10), while a 90% control of the weed (ED90) is considered a reasonable control level.The larger the SI the more selective is one herbicide as compared to another herbicide.Obviously, the above SI(90,10) values are all much larger than 1.00, but only for curve no 4 and 5 the value is significantly larger than 1.00.
The function SI can also be used to compare potencies among absolute response levels.The EDx for a dose response curve is a relative response level depending upon the upper and lower limits of the curve.For example in PestSci the response for the dose ED50 for the 2nd curve is 0.54 ((0.94+0.13)/2)whilst for the 4th curve the response level is 1.38 ((2.15+0.08)/2).In fact 1.38 exceeds the maximum response level of the 2nd curve.Consequently, we sometimes want to compare two curves at a certain absolute response level and it can easily be done with SI.Let us assume that for various reasons we want to compare curves 2 and 4 at an absolute response level of 0.5.For the 2nd curve it means at the dose ED46 ( 0.46 = (0.5-0.13)/(0.94-0.13) ) and for 4th curve at the dose ED21.To obtain only the comparison between the 2nd curve and the 4th curve we use SI with an extra vector argument giving the curves to be compared > SI (modelex2, c(46, 21) In this section we illustrate how to reduce a model using significance tests.
The data set TM comprises of 7 response curves, measured at a number of positive dose values, and an additional group of measurements at dose zero.A separate control group frequently occurring case in bioassay analysis.There are three variables: dose is the dose, pct the curve number and rgr the response.The responses are growth rates of duckweed and the treatments are mixtures of two herbicides with different modes of action (Cedergreen, 2004).There are 180 observations.We define a simultaneous model, assuming that each dose response curve follows a fourparameter Weibull model (3) and with different parameters for different assays.This model is specified as follows > modelex3.1 <-multdrc(rgr ~dose, pct, data = TM, fct = w4()) Control measurements detected for level: 999 As pct take 8 different values (7 curves + control group), we are specifying a model with 8 curves, where the curve corresponding to the control group only is defined as dose equal to 0 and thus not constitutes an entire dose response curve.The asymmetric four-parameter Weibull model is specified with the argument fct=w4().The lack-of-fit test against the twoway ANOVA using anova (not displayed) confirms that the Weibull model fits the data.
A summary of the fit modelex3.1 is > summary(modelex3.1)A more reasonable initial model would only have a single d parameter common to all curves, as the control group determines the common upper limit for all 7 curves.This can be specified in multdrc using the collapse argument specifying which parameters should be collapsed across assays.This argument may be specified using a data frame or a list as argument.If no collapse argument is given there are different parameters for different curves (used in sections 4 and 5).The two types of argument are overlapping in their functionality, but not entirely identical, both have strong points.The data frame specification is better for collapsing parameters for arbitrary curves, without requiring the corresponding grouping variable to be defined.The list specification allows more general structures involving more than one variable per parameter.Specification of the collapse argument by means of a list of formulas, follows the same syntax as is used for lm and glm.
The data frame should contain as many columns as there are parameters in the model, eg four columns in case of the four-parameter Weibull model, and the columns correspond to parameters by order, eg the first column corresponds to b, . .., the fourth column to e.For built-in function the order of parameters is always alphabetical.Observations sharing the same value in a column will share the corresponding parameter in the model.We specify the model with common upper as follows modelex3.c(3,1) Therefore this column results in a single common d parameter for all assays.
The above model can also be specified using the names of the variables in TM and using the constant factor.Thus the alternative specification looks like modelex3.2<-multdrc(rgr~dose, pct, collapse=data.frame(pct,pct, 1, pct), data=TM, fct=w4()) or, using a list modelex3.2<-multdrc(rgr~dose, pct, collapse=list(~factor(pct), ~factor(pct), ~1, ~factor(pct)), data=TM, fct=w4()) In order to compare modelex3.2to modelex3.1 the anova function can be used.The test is highly significant and the model with common upper limit is rejected.Notice that the order of the two arguments in anova does not matter.The main problem with the data set TM is the presence of hormesis (described in section 5), thus a more appropriate approach would be to use a hormesis model (Cedergreen et al., 2005).
Another aspect of modelex3.1 is that it seems from the summary of modelex3.1 that all lower limits are equal.The model with a common lower limit is > modelex3.3<-multdrc(rgr ~dose, pct, collapse = data.frame(pct,+ 1, pct, pct), data = TM, fct = w4()) Control measurements detected for level: 999 The test for model reduction is not significant > anova(modelex3.Thus modelex3.3provides as good a fit to the data as does modelex3.1.Next we fit a model with the common lower limit equal to 0. This can be done using the built-in function w3() which is a three-parameter Weibull model (lower limit set to 0). > modelex3.4<-multdrc(rgr ~dose, pct, data.frame(pct,pct, pct), + data = TM, fct = w3()) Control measurements detected for level: 999 The F-test for reduction from modelex3.Thus we can reduce the initial model to a model with common lower limit equal to 0.

Conclusions
We have described the package drc for fitting multiple non-linear regression models, with special focus on applications in bioassay analysis.Thus the package drc is an attempt to develop a collection of functions specifically designed for bioassay analysis.Relevant interpretations of a model fit is easy using the built-in models.
At the same time we would like to emphasize that drc provides functionality for simultaneous fitting of arbitrary non-linear regression models under the assumption of independent and normally distributed measurement errors, such functionality was not previously available in R.

•
anova: lack-of-fit test or test for reduction between two models • coef: parameter estimates • fitted: fitted values • logLik log likelihood value • plot: plot of the fitted curves • residuals: raw residuals • summary: summary of the model fit • vcov: estimated variance-covariance matrix

Figure 2 :
Figure 2: Plot of the data in the data set PestSci.

Figure 3 :
Figure 3: Observed data and fitted dose response curves for the data set PestSci.

Figure 4 :
Figure 4: Observed data and fitted dose response curves for the data set TM.
The test is not significant, implying that the four-parameter logistic model provides as good a fit as the one-way ANOVA.A summary of the fit, including the type of model fitted, the parameter estimates and their standard deviations, is obtained using the summary function The lack-of-fit test comparing the simultaneous four-parameter logistic model to the alternative two-way ANOVA model is obtained applying the function anova to the object modelex2 The call to ED is Notice that the estimates for the parameter ED50 already were displayed above in the summary output as the standard parameterization of the four-parameter logistic model involves ED50.
, c(2, 4)) By default only the d parameter (the upper limit at dose zero) is estimated for the control group, represented in the summary by the term d:999.Figure4clearly shows that allowing individual upper limits for the individual curves does not produce a fit that agrees with the control group.
Thus in modelex3.2the parameters b, c and e will vary from curve to curve.The function colFct is used for collapsing a column into a column with fewer distinct values.The call colFct(TM[,3], 1:8) collapses all 8 different values in TM[,2] into a single value.The content of colFct(TM[,2], 1:8) is (again 180 values) The anova function with the two model objects as arguments produces an approximate F-test (or a likelihood ratio test) for test of reduction of the larger model to the smaller model.For the models modelex3.1 and modelex3.2anova gives an F-test for reduction from the model with different upper limits to the model with a common upper limit.