Statistical approach for selection of regression model during validation of bioanalytical method

The selection of an adequate regression model is the basis for obtaining accurate and reproducible results during the bionalytical method validation. Given the wide concentration range, frequently present in bioanalytical assays, heteroscedasticity of the data may be expected. Several weighted linear and quadratic regression models were evaluated during the selection of the adequate curve fit using nonparametric statistical tests: One sample rank test and Wilcoxon signed rank test for two independent groups of samples. The results obtained with One sample rank test could not give statistical justification for the selection of linear vs. quadratic regression models because slight differences between the error (presented through the relative residuals) were obtained. Estimation of the significance of the differences in the RR was achieved using Wilcoxon signed rank test, where linear and quadratic regression models were treated as two independent groups. The application of this simple non-parametric statistical test provides statistical confirmation of the choice of an adequate regression model.


Introduction
The quality of the bioanalytical data is highly dependent on the appropriateness of the regression model applied for quantitative analysis.The selection of an adequate regression model is the basis for accurate and reproducible quantification over the whole concentration range, whereas the inappropriate regression model presents a source of error and imprecision of the bioanalytical method (Rozet et al., 2011).The Ordinary Least Squares (OLS), which belong to the group of linear regression models is the most commonly used regression model for narrow range of concentration, where homoscedasticity of the data is presupposed (constant variance over the whole concentration range) (Singtoroj et al., 2006;Peters et al., 2007) Most bioanalytical methods for estimation of drug concentration in all pharmacokinetic phases, usually apply wide concentration range.When the concentration range is broad, heteroscedasticity of the data is expected (variance increases with increasing concentration).The application of the OLS model on heteroscedastic data will generate inaccurate analytical results, especially in low concentrations.One way to handle heteroscedastic data is to apply weighted least square (WLS) regression on OLS.The principle of weighting is to give more importance to the data points with low variance and less importance to data points of high variance.The selection of adequate WLS model will balance the regression line to generate an evently distributed error throughout the calibration range (Kimanani E.K, 1998;Almeida et al., 2002;Singtoroj et al., 2006;Tellinghuisen, 2008).Most commonly used weights in bioassays are 1/x, 1/x 2 , 1/y and 1/y 2 .Another approach to linearise non-linear data and reducing heteroscedasticity is the use of quadratic regression model.This model is the simplest form of polynominal regression and the same as with the linear regression, could be weighted or not (Singtoroj et al., 2006, Hubert et al., 2007).
Considering the drug regulatory agencies request "The Maced.pharm.bull., 60 (1) 19 -25 (2014) Natalija Nakov, Jasmina Tonic-Ribarska, Aneta Dimitrovska, Rumenka Petkovska simplest model that adequately describes the concentration-response relationship should be used" (EMA 2011, FDA 2001), the question is how to select the adequate regression model.The statistic criteria such as coefficient of determination (R 2 ) is only informative and not relevant, since different regression models give similar values for R 2 and in addition acceptable R 2 values may be obtained despite the impaired accuracy (Hartmann et al., 1998;Castillo & Castells, 2001;De Souza & Junqueira, 2005;Stockl et al., 2009).The approach based on comparison of the sum of relative residuals (SRR) obtained for each regression model is reported by several authors (Lang & Bolton, 1991;Wieling et al., 1996;Almeida et al., 2002).According this approach the regression model with lowest SRR should be chosen as adequate.Singtoroj et al (Singtoroj et al., 2006) proposed a strategy based on using non-parametric test of ranks for evaluation of regression models in terms of SRR obtained from calibration curve fit and calibration curve predictability.
An issue that arises is whether these approaches enable selection of the simplest regression model, especially in cases when slight difference between the SRR are obtained.The objective of our work is to present a statistical approach for selection of an adequate regression model during the bioanalytical method validation.The proposed approach is based on two non-parametric statistical tests: One sample rank test and Wilcoxon signed rank test for two independent groups of samples.

Sample preparation and chromatographic conditions
The data that were statistically evaluated were derived from the validation of the HPLC-MS/MS method for ibuprofen (IBP) quantification in human plasma.Briefly, the calibration standards (CS) and quality control (QC) samples were prepared by spiking 50 µl working standard solutions of rac-IBP with 950 µl blank human plasma.The plasma samples, containing ketoprofen as an internal standard (IS), were extracted using LLE procedure.The analysis was conducted on TSQ Quantum Discovery Max triple quadrupole mass spectrometer (Thermo Scientific, USA).Chromatographic separation was achieved on Lux Cellulose 3 chromatographic column (Phenomenex, 250 x 4.6 mm) using 0.1 % (v/v) acetic acid in mixture of methanol/water (90:10, v/v) as a mobile phase.Analyses were conducted at a flow rate of 0.6 mL/min and the injection volume was 10 µl.The mass spectrometer was operated in the selected reaction monitoring (SRM) mode using negative electrospay ionization.Ibuprofen and ketoprofen were quantified in selected reaction monitoring (SRM) using the transition of m/z 205.1→161.3 and m/z 253.2→209.2,respectively.

Data analysis
Six-point calibration curve was constructed from the peak area ratio (peak area IBP / peak area IS) vs. concen-tration in a range from 0.1 -50 mg/L, over three days.QC samples at concentrations of 0.1mg/L (LLOQ); 0.2 mg/L (LQC), 20mg/L (MQC) and 40 mg/L (HQC) in five replicates for each QC level, were analyzed along with the calibration curves.Five linear and five quadratic regression models were evaluated during the selection of the adequate curve fit.
The CS and QC data were fitted on OLS, weighted linear (1/x, 1/x 2 , 1/y and 1/y 2 ) and on quadratic models (non-weighted and weighted) using calibration curve options in MS data system (Xcalibur TM Data system, Thermo Scientific, USA).The relative residuals (RR) were calculated based on back-calculated concentration obtained from each regression model and nominal concentration (Equation 1).
Eq. 1 The SRR was computed from the average RR (%) obtained from each calibration level and every QC level (Equation 2).

Eq. 2
The test of homoscedasticity was carried out using F-test and the obtained experimental value was compared with tabled one, for level of significance 0.05.
The non-parametric statistical tests (One sample rank test and Wilcoxon signed rank test for two independent groups of samples) were calculated using Windows Excel (Microsoft Corporation).

Results and discussion
The range of the calibration curve for the bioanalytical methods should be established to allow adequate description of the pharmacokinetics of the analyte of interest (EMA 2011).Therefore before defining the concentration range, it is essential to have information about the expected concentrations of the analyte in biological matrix.Considering that different dosage strength (200 mg, 400 mg, 800 mg etc) of IBP are available on the market and the information obtained from the literature data (Canaparo et al., 2000;Bonato et al. 2003;Szeitz et al., 2010), a 500-fold concentration range was required (0.1 -50 mg/L).
The data of CS and QC samples obtained during the validation process were fitted to the OLS (linear non-weighted) regression model, as this model is the starting point for the selection of the adequate calibration curve.Given the wide concentration range, heteroscedasticity of the data was expected.In order to confirm this expectation, a test for homoscedasticity was carried out.The F-test revealed that the variance is not evenly distributed over the whole concentration range, since the experimental F-value (F=75.4) was significantly higher than the tabled one (Ftab =9.28).The heteroscedasticity of the data was further confirmed from the RR of the CS samples obtained from three days calibration curve (Fig. 1).The RR obtained from the lowest concentration was 930% compared with 5% RR for the highest concentration.The RR data clearly showed that fitting the linear non-weighted model generates inaccurate results especially in the lower concentration range, indicating the need of regression models different from the linear non-weighted.Therefore four linear WLS and five quadratic models (one non-weight and four weighted) were further constructed on the same data set.
The calculated regression models were evaluated according the traditional approach based on the SRR (Almeida et al., 2002) with slight modification referred to the calculation of SRR.In our investigation the SRR calculation was based on errors obtained from the CS and the QC samples, whereas in the literature data the SRR was based on the errors obtained just from the CS.The SRR generated from each regression model is present in Table 1.The SRR obtained using quadratic models (non-weight and weighted) were lower than the SRR obtained from the linear models.In addition all quadratic models generate same percent of error (presented through SRR), which is not the case for linear models where large difference between the errors were observed.The OLS and weighted 1/x and 1/y linear regression models generate large error, whereas 1/x 2 and 1/y 2 weighted linear models gave similar error as the corresponding quadratic weighted models.Considering that quadratic models are more complex than weighted linear, additional investigation should be carried out before selecting the quadratic models as an adequate regression model.
Statistical tests were applied for evaluation of the significance of the difference between the errors obtained with quadratic and linear 1/x 2 and 1/y 2 models.The use of statistical approach for the selection of the adequate regression model is important, since the selection of more complex regression model should be justified.The proposed statisti-  cal approach was based on nonparametric tests because the obtained data were independent, their distribution was not Gaussian and generally these statistical tests are less affected by outlying values.Non-parametric One sample rank test based on ranking of the regression models in terms of SSR obtained from calibration curve fit and calibration curve predictability was reported by Singtorojet al (Singtoroj et al., 2006).The SRR of the CS obtained from each calibration level over three days were used for the assessment of the calibration curve fit and the calibration curve predictability was evaluated through the accuracy and precision obtained from four independent QC levels.Afterwards, the investigated regression models were ranked according SRR obtained from the CS (Table 2) and QC samples (Table 3).The final rank was obtained as a sum of ranks of the calibration curve fit and their predictability (Table 4).The OLS, 1/x and 1/y weighted linear regression models did not meet the acceptance criteria of ±15 % and ±20 % for LLOQ (Table 2 and Table 3) given by the EMA guide for bioanalytical Natalija Nakov, Jasmina Tonic-Ribarska, Aneta Dimitrovska, Rumenka Petkovska method validation (EMA 2011) and consequently were not included in the further evaluation.According this statistical test, the selection of the regression model is based on the final rank, so the quadratic 1/x 2 model ranked with 1 should be selected as an adequate (Table 4).
The data shown in Table 2 and Table 3 indicated that weighted 1/x 2 and 1/y 2 linear models also fulfill the EMA criteria of ±15 % and ±20 % and gave similar SRR compared to the quadratic models, using the traditional approach.However these weighted linear models had final rank 5 and 7 respectively, while the corresponding quadratic models (1/x 2 and 1/y 2 ) were ranked with 1 and 6, respectively (Table 4).This evaluation indicated that One sample rank test could not give justification for the selection of the weighted quadratic over the weighted linear regression models.In order to evaluate the significance of the differences between the errors obtained using weighted linear and quadratic regression models, non-parametric Wilcoxon sing rank test for two independent groups of samples was applied.The Wilcoxon signed rank test was performed using data for RR obtained by linear 1/x 2 and 1/y 2 models as first group and quadratic 1/x 2 and 1/y 2 models as a second group of samples.This test analyses not just the differences between the RR, but it also takes into account the magnitude of the observed differences.The data evaluated in the two matched groups were the average RR obtained from each calibration level during three days and the average RR obtained from the replicates of four QC sample levels.Afterwards the differences between the RR were computed, ranked and depending on the observed differences sign ("+"or "-") was attached on each rank (Table 5).The null hypothesis (H 0 ) was that there is no difference between the RR generated with linear and quadratic models versus H 1 hypotheses: there is a difference between the RR obtained with the investigated regression models.The test statistic for the Wilcoxon signed rank test is W, defined as the smaller of W+ (sum of positive ranks) and W-(sum of negative ranks).The W+ was found to be 130 and the Wwas 80.The critical value of W for n=20 with α=0.05 is 52.The decision rule is reject H 0 if W ≤ 52.Given that 80 > 52, the null hypothesis should be accepted.
The results derived using Wilcoxon signed rank test showed that although quadratic models generated smaller error than the weighted 1/x 2 or 1/y 2 linear models, the difference between the errors were not statistically significant.Considering the request of the regulatory agencies that the simplest model that adequately describes data should be chosen, the proposed statistical approach gave justification for the selection of linear 1/x 2 model as an adequate regression model for the IBP calibration curve.

Conclusion
The present investigation showed that during the selection of a regression model for bioanalytical assays with broad concentration range, statistical approach based on non-parametric tests should be used.The results obtained within this paper showed that One sample rank test could not give statistical justification for the selection of linear vs. quadratic regression models because slight differences between the SRR were obtained.The proposed statistical approach, based on Wilcoxon signed rank test wherein linear and quadratic regression models were evaluate as two separate groups, allowed estimation whether the differences in errors are statistically significant.The application of this simple non-parametric statistical test provides statistical justification of the choice of an adequate regression model.

Fig. 1
Fig. 1 Residuals of the calibration standards plotted against concentration

Table 2 .
One sample rank test -ranking of the regression models based on the RR(%) obtained for the CS Макед.фарм.билт., 60 (1) 19 -25 (2014) Statistical approach for selection of regression model during validation of bioanalytical method