SBpipe: a collection of pipelines for automating repetitive simulation and analysis tasks

Background The rapid growth of the number of mathematical models in Systems Biology fostered the development of many tools to simulate and analyse them. The reliability and precision of these tasks often depend on multiple repetitions and they can be optimised if executed as pipelines. In addition, new formal analyses can be performed on these repeat sequences, revealing important insights about the accuracy of model predictions. Results Here we introduce SBpipe, an open source software tool for automating repetitive tasks in model building and simulation. Using basic YAML configuration files, SBpipe builds a sequence of repeated model simulations or parameter estimations, performs analyses from this generated sequence, and finally generates a LaTeX/PDF report. The parameter estimation pipeline offers analyses of parameter profile likelihood and parameter correlation using samples from the computed estimates. Specific pipelines for scanning of one or two model parameters at the same time are also provided. Pipelines can run on multicore computers, Sun Grid Engine (SGE), or Load Sharing Facility (LSF) clusters, speeding up the processes of model building and simulation. SBpipe can execute models implemented in COPASI, Python or coded in any other programming language using Python as a wrapper module. Future support for other software simulators can be dynamically added without affecting the current implementation. Conclusions SBpipe allows users to automatically repeat the tasks of model simulation and parameter estimation, and extract robustness information from these repeat sequences in a solid and consistent manner, facilitating model development and analysis. The source code and documentation of this project are freely available at the web site: https://pdp10.github.io/sbpipe/. Electronic supplementary material The online version of this article (doi:10.1186/s12918-017-0423-3) contains supplementary material, which is available to authorized users.

Table S1: Computational model equations and parameters.A minimal model representing the phosphorylation of the insulin receptor by insulin was used for testing the SB pipe package.Insulin is a constant input in the system.For this simple model the cell volume is set to 1. Units of parameters and amounts are arbitrary (a.u.).For a more descriptive graphic representation, see Supplementary Figure S1.This model and the analyses illustrated in this document are available with SBpipe in $SBPIPE/tests/insulin_receptor.Table S3: Parameter estimation results using the data set in Supplementary Table S2A.A sequence of 250 ts using COPASI's Particle Swarm algorithm was generated in order to estimate the kinetic rate constant parameters k1-k3.(A) Parameter values and condence intervals within 66%, 95%, or 99% condence levels are reported.The ratios between parameter value and left/right condences are also calculated as a measure of the number of folds between values and their condence intervals.This table is saved in a le called param_estim_details.csv within Results/MODEL_NAME.(B) Parameter estimation summary showing the minimum objective value, the associated Akaike information criterion (AIC), the AIC with the corrected term (AICc), the Bayesian information criterion (BIC), the number of estimated parameters, the number of total data points, the objective value for each condence level, the number of ts below each condence level.In the calculation of AIC, AICc, and BIC, additive Gaussian measurement noise of width 1 is assumed.If the objective function is χ 2 , the term −lnL(Θ|y) in the information criteria approximates the calculated χ 2 .Approximate 100(1 − α)% condence regions for a parameter p are computed by: χ Table S4: Parameter estimation results with identiability issues using the data set in Supplementary Table S2B.The condence intervals which could not be clearly identied are labelled inf (innite) and marked in red.Detailed result plots are shown in Supplementary Figures S6S8.For more detail, see Supplementary Table S3.
Figure S1: Diagram for the insulin receptor test model.This simplied model of insulin receptor was used for testing the pipeline.In the presence of insulin, the insulin receptor beta (IRβ) is phosphorylated at its Y1164 phosphorylation site.The phosphorylated receptor is then dephosphorylated and enters in a refractory state.This latter state is used to represent a delay in the system and simplies the processes of receptor internalisation, degradation and synthesis, reducing the number of model parameters.Finally from this refractory state the receptor can newly become functional.all the parameter sets with objective value lesser than the calculated 99% condence levels are plotted as black dots.The use of these additional parameter sets allows to sample a portion of the parameter space.For reasonably long t sequences, these parameter sets can be used to describe the neighbourhood of each estimated parameter.In particular, they can show whether well dened condence intervals can be calculated for each parameter.Using data set shown in Table S2A, for each of the estimated parameters, parameter sets at the left and right of each minimum value sharply decrease the model t with the data, suggesting that the parameters are uniquely identiable.Horizontal lines indicate the condence levels of 66% (magenta), 95% (green), and 99% (blue), respectively.The estimated parameter values and the respective condence intervals are reported in Supplementary Table S3.(B) The increase in t quality with respect to the number of internal iterations performed by a parameter estimation algorithm. (A) Figure S4: Parameter estimation pipeline (2nd group).Data generated using the conguration le described in Supplementary Figure S2.This gure shows the distribution of each parameter (column).The number of samples is determined by the applied lter.The rst row only includes the best ts (A), whereas the remaining rows include all the ts within the condence levels of 66% (B), 95% (C), and 99% (D), respectively, as calculated in Supplementary Figure S3.As in this case the parameters are clearly identiable, these distributions show that all the parameters converge to single parameter values (the parameter optima).For non-identiable parameters, see Supplementary Figure S7. Figure S5: Parameter estimation pipeline (3rd group).Data generated using the conguration le described in Supplementary Figure S2.This gure shows the correlations between parameters (x/y axes) indicating the t objective value with the colour (the lower the better).The number of points is determined by the applied lter.The rst row only includes the best ts (A), whereas the remaining rows include all the ts within the condence levels of 66% (B), 95% (C), and 99% (D), respectively, as calculated in Supplementary Figure S3.As in this case the parameters are clearly identiable, the parameters do not relate with each other.For nonidentiable parameters, see Supplementary Figure S8.S2B, the parameters k1, k2, and k3 are not uniquely identiable.The parameters k1 and k3 have indenite right and left condence intervals respectively.The parameter k2 is poorly identied despite the fact that condence intervals can be estimated.(B) See Supplementary Figure S3 for reference.S2A.These plots can be disabled using the options exp_dataset and plot_exp_dataset in the conguration le (see Supplementary Figure S9).S3.Therefore this parameter scan oers an indication of the robustness of the model output upon uncertainty for the parameter k1.

Figure S3 :
FigureS3: Parameter estimation pipeline (1st group).Data generated using the conguration le described in Supplementary FigureS2.Parameter estimations were executed using Particle Swarm algorithm in COPASI.(A) Approximated prole likelihood estimations (PLE) for the kinetic rate constants (k1-k3) are shown.Rather than considering only the parameter sets with the minimum objective value (the best ts calculated by a parameter estimation algorithm), all the parameter sets with objective value lesser than the calculated 99% condence levels are plotted as black dots.The use of these additional parameter sets allows to sample a portion of the parameter space.For reasonably long t sequences, these parameter sets can be used to describe the neighbourhood of each estimated parameter.In particular, they can show whether well dened condence intervals can be calculated for each parameter.Using data set shown in TableS2A, for each of the estimated parameters, parameter sets at the left and right of each minimum value sharply decrease the model t with the data, suggesting that the parameters are uniquely identiable.Horizontal lines indicate the condence levels of 66% (magenta), 95% (green), and 99% (blue), respectively.The estimated parameter values and the respective condence intervals are reported in Supplementary TableS3.(B) The increase in t quality with respect to the number of internal iterations performed by a parameter estimation algorithm.

Figure S6 :
Figure S6: Parameter estimation pipeline with identiability issues (1st group).Data generated using the edited version of the conguration le described in Supplementary Figure S2.(A) Using data set shown in TableS2B, the parameters k1, k2, and k3 are not uniquely identiable.The parameters k1 and k3 have indenite right and left condence intervals respectively.The parameter k2 is poorly identied despite the fact that condence intervals can be estimated.(B) See Supplementary FigureS3for reference.

Figure S7 :
Figure S7: Parameter estimation pipeline with identiability issues (2nd group).Data generated using the edited version of the conguration le described in Supplementary Figure S2.Parameter distributions using the best ts (A), all the ts within 66% condence level (B), 95% condence level (C), or 99% condence level (D).The distributions for the parameters k1, k2, and k3 indicate that those parameters are not well dened.In particular k2 and k3 show skewness and local minima.

Figure S8 :
Figure S8: Parameter estimation pipeline with identiability issues (3rd group).Data generated using the edited version of the conguration le described in Supplementary Figure S2.Parameter correlations using the best ts (A), all the ts within 66% condence level (B), 95% condence level (C), or 99% condence level (D).The parameters k1, k2, and k3 show non linear correlations in the logarithmic parameter space, particularly due to k1.

Figure S9 :Figure S10 :
FigureS9: Example of conguration le for the simulate pipeline.This conguration le was used to generate the Supplementary FigureS10Band the second plot in FigureS10C.A similar conguration le setting: model: "insulin_receptor.cps"local_cpus: 1 runs: 1 was used to generate Supplementary FigureS10Aand the rst plot in FigureS10C.Comments are marked in blue.

Figure S12 :
Figure S12: Single parameter scan pipeline.Data generated using the conguration le described in Supplementary Figure S11.Simulations were generated using LSODA algorithm in COPASI.(A) Parameter scan for IRβP ercent, a species which controls IRβ using percentages.IRβP ercent was scanned within the range [0, 100] percent.(B) Parameter scan for k1.The parameter k1 was scanned within the range [0.25119, 0.843858].This range corresponds to the condence interval at 95% condence level as shown in Supplementary TableS3.Therefore this parameter scan oers an indication of the robustness of the model output upon uncertainty for the parameter k1.

Figure S13 :Figure S14 :
FigureS13: Example of conguration le for the double parameter scan pipeline.This conguration le was used to generate the Supplementary FiguresS14 and S15.Comments are marked in blue.

Figure S15 :
Figure S15: Double parameter scan pipeline for IRβ_pY 1146 and IRβ_ref ractory (2nd group).Data generated using the conguration le described in Supplementary Figure S13.Simulations were generated using LSODA algorithm in COPASI.The InsulinP ercent and IRβP ercent species were scanned within the range [0, 100] percent.Each plot corresponds to a time point from 0 min to 10 min.Colours indicate species levels in arbitrary units.

Table S2 :
Data sets used for parameter estimation.Two data sets are independently used for parametrising the insulin receptor model.The rst data set (A) contains a sucient amount of data for eective parameter estimation (see Supplementary TableS3and Supplementary Figures S3S5).On the contrary, the second data set (B), which is a subset of the rst data set, is not sucient to estimate accurately the parameters (see Supplementary TableS4and Supplementary Figures S6S8).
where p are estimations for the parameter p, F α m,n−m is the F -ratio distribution, m is the number of model parameters, n is the number of data points, and α is the signicance level.This table is saved in a le called param_estim_summary.csv within Results/MODEL_NAME.Detailed result plots are shown in Supplementary Figures S3S5.