Tellurium Notebooks - An Environment for Dynamical Model Development, Reproducibility, and Reuse

The considerable difficulty encountered in reproducing the results of published dynamical models limits validation, exploration and reuse of this increasingly large biomedical research resource. To address this problem, we have developed Tellurium Notebook, a software system that facilitates building reproducible dynamical models and reusing models by 1) supporting the COMBINE archive format during model development for capturing model information in an exchangeable format and 2) enabling users to easily simulate and edit public COMBINE-compliant models from public repositories to facilitate studying model dynamics, variants and test cases. Tellurium Notebook, a Python–based Jupyter–like environment, is designed to seamlessly inter-operate with these community standards by automating conversion between COMBINE standards formulations and corresponding in–line, human–readable representations. Thus, Tellurium brings to systems biology the strategy used by other literate notebook systems such as Mathematica. These capabilities allow users to edit every aspect of the standards–compliant models and simulations, run the simulations in–line, and re–export to standard formats. We provide several use cases illustrating the advantages of our approach and how it allows development and reuse of models without requiring technical knowledge of standards. Adoption of Tellurium should accelerate model development, reproducibility and reuse. Author summary There is considerable value to systems and synthetic biology in creating reproducible models. An essential element of reproducibility is the use of community standards, an often challenging undertaking for modelers. This article describes Tellurium Notebook, a tool for developing dynamical models that provides an intuitive approach to building and reusing models built with community standards. Tellurium automates embedding human–readable representations of COMBINE archives in literate coding notebooks, bringing to systems biology this strategy central to other literate notebook systems such as Mathematica. We show that the ability to easily edit this human–readable representation enables users to test models under a variety of conditions, thereby providing a way to create, reuse, and modify standard–encoded models and simulations, regardless of the user’s level of technical knowledge of said standards.

The considerable difficulty encountered in reproducing the results of published dynamical models limits validation, exploration and reuse of this increasingly large biomedical research resource. To address this problem, we have developed Tellurium Notebook, a software system that facilitates building reproducible dynamical models and reusing models by 1) supporting the COMBINE archive format during model development for capturing model information in an exchangeable format and 2) enabling users to easily simulate and edit public COMBINE-compliant models from public repositories to facilitate studying model dynamics, variants and test cases. Tellurium Notebook, a Python-based Jupyter-like environment, is designed to seamlessly inter-operate with these community standards by automating conversion between COMBINE standards formulations and corresponding in-line, human-readable representations. Thus, Tellurium brings to systems biology the strategy used by other literate notebook systems such as Mathematica. These capabilities allow users to edit every aspect of the standards-compliant models and simulations, run the simulations in-line, and re-export to standard formats. We provide several use cases illustrating the advantages of our approach and how it allows development and reuse of models without requiring technical knowledge of standards. Adoption of Tellurium should accelerate model development, reproducibility and reuse.

Author summary
There is considerable value to systems and synthetic biology in creating reproducible models. An essential element of reproducibility is the use of community standards, an often challenging undertaking for modelers. This article describes Tellurium Notebook, a tool for developing dynamical models that provides an intuitive approach to building and reusing models built with community standards. Tellurium automates embedding human-readable representations of COMBINE archives in literate coding notebooks, bringing to systems biology this strategy central to other literate notebook systems such as Mathematica. We show that the ability to easily edit this human-readable representation enables users to test models under a variety of conditions, thereby providing a way to create, reuse, and modify standard-encoded models and simulations, regardless of the user's level of technical knowledge of said standards.

Introduction 1
Multiscale dynamical modeling requires the ability to build large, comprehensive, and 2 complex models of biological systems. Examples include the Mycoplasma genitalium 3 whole-cell model [1] and the central metabolism of E. coli [2]. These models are often 4 composed of many submodels. Typically, submodels are developed and validated by 5 other research teams. Indeed, without the ability to reuse existing models, constructing 6 larger models becomes impractical. 7 Being able to reuse tools and techniques developed by others is a hallmark of science. 8 Poor reproducibility of biomedical experimental studies has been recognized as a major 9 impediment to scientific progress [3,4]. Much of the focus on poor reproducibility has 10 been on wet lab experiments. However, barriers to reproducibility is also a significant 11 problem in computational studies [5][6][7][8]. In recognition of this problem, 12 reproducibility has become a central focus of scientific software [9,10]. The general 13 experience of researchers in the field of modeling suggests that a similar problem in 14 poor reproducibility also exists for biomodels. Difficulty in model reproducibility can 15 result from a published model not being deposited in a public repository or from 16 differences in the deposited model and the actual model used for published simulations. 17 In addition, it is difficult for researchers to utilize and modify public models because the 18 standards are not human-readable. This state of affairs imperils continued progress 19 with developing and exploiting biological models. 20 We propose that reproducible computational studies must satisfy two requirements. 21 First, they must be transparent; that is, researchers must be able to inspect and 22 understand the details of the model and the computational experiments. With 23 transparency, researchers can check assumptions and explore variations in 24 computational studies. Second, computational studies must be exchangeable; that is, it 25 must be possible for a study done in one computational environment to be done in 26 another computational environment and produce comparable results. For a study to be 27 exchangeable means that other researchers can make use of and build on the published 28 results in their computational environment. 29 In order to be transparent and exchangeable, a computational model and any 30 simulation experiments must be encoded in a standard format that separates the 31 reusable part of a model and its simulations (i.e. parameters, processes, and kinetics) 32 from the implementation used to simulate it (i.e. the numerical methods and algorithms 33 used to generate results). Models can be described using the Synthetic Biology Markup 34 Language (SBML) [11] or CellML [12] standards. These standards support models 35 based on ordinary differential equations (ODEs), stochastic master equations, and 36 constraint-based modeling [13]. Simulations can be described using the Simulation 37 Experiment Description Markup Language (SED-ML) [14], which encodes the types of 38 simulations, either time-course simulations or steady state computations, that should be 39 run on a model. SED-ML allows specifying the exact numerical algorithms needed to 1. It should represent the models or simulation specifications in a human-readable 52 form. 53 2. It should allow the user to easily edit this human-readable representation. 54 3. It should allow the user to provide narrative, annotations, or comments in order 55 to improve transparency. 56 4. It should translate the specifications into an implementation that can be used to 57 run simulations. 58 5. It must be capable of repackaging the model and/or simulation in a standard form 59 that is usable by other tools. 60 To address these requirements, When grown in continuous culture, yeast exhibit oscillations that can be maintained 98 for months and can cause the temporal separation of many cellular functions in a 99 synchronized way [29]. This type of dynamics belongs to a broader class of coupled 100 oscillators that are thought to be highly important in the organization of circadian 101 rhythms and play a role in the regulation of diurnal physiological activity in various 102 species [30,31]. Wolf, et al. [28] developed a model that uses metabolic coupling 103 mediated in part by the H 2 S pathway, one of the known contributors to respiratory 104 oscillations in yeast [29], to explain experimentally observed synchronized oscillations in 105 yeast cultures.

148
This case study shows that Tellurium provides an efficient means of converting 149 SBML models into exchangeable COMBINE archives containing simulation components. 150 Furthermore, COMBINE archives can contain important dynamical information about 151 the model, such as the influence of the parameter m that we explored in this study. In 152 order to demonstrate exchangeability of this study, we have exported it to the SED-ML 153 Web Tools [22] (Fig S1) and iBioSim [35] (Fig S2). Reproducibility requires that a model implementation produces results consistent with 157 the original study, especially if a different authoring tool is used. In order to provide 158 criteria for judging whether a model reproduction is consistent with the original, a set of 159 testing criteria are required, similar to the concept of unit testing in software. However, 160 researchers seldom perform extensive checks on the dynamics of models before using 161 them. This is due in part to the lack of tool support for easily modifying and producing 162 variants of models and simulations encoded in exchangeable formats. Tellurium's 163 authoring features enable modelers to encode dynamical unit tests in COMBINE 164 archives, thereby providing a way to verify that a model has been correctly reproduced. 165 For this case study, we reproduce a highly-detailed model of syncytial nuclear 166 divisions in the Drosophila embryo [36] through testing the model's dynamics under 167 different conditions. In many insect species, the embryo enters a period of rapid mitotic 168 division without cytokinesis [37] immediately following fertilization. In Drosophila, 13 of 169 these divisions occur within 3 hours of fertilization [36]. These divisions are regulated 170 by metaphase promoting factor (MPF), a complex between cyclin (specifically the 171 cyclin CycB in this model) and cyclin-dependent kinases (Cdk). CycB subunits tend to 172 be the limiting factor in complex formation, and are thought to regulate mitotic 173 division. CycB availability is controlled by the anaphase promoting complex (APC), 174 which targets CycB for degradation. However, in Drosophila, the levels of CycB appear 175 to remain high during the first 8 mitotic divisions [38]. This observation can be 176 reconciled with known mechanisms by assuming that CycB degradation only occurs in 177 the vicinity of the mitotic spindle [36,39,40], despite the absence of a nuclear envelope 178 during the mitotic divisions. To account for this hypothetical local degradation of CycB, 179 the model artificially separates the cytoplasm into two "compartments," with a 180 cytoplasmic compartment representing the cell and a nuclear compartment representing 181 the volume in the vicinity of the mitotic spindle.

182
As a starting point, we use the COMBINE archive encoding of this model by Scharm 183 and Tourè [41]. This archive contains SBML derived from biomodel BIOMD0000000144 1 , 184 which is intended to reproduce Fig 1 of [36]. However, the archive does not contain 185 more extensive tests of the model's dynamics, such as whether the model can be used to 186 reproduce several other simulations described in the paper. The initial variant encoded 187 by the COMBINE archive and shown in Fig 3B and C is based on a model with a 188 constant level of the phosphatase String, whereas in reality String levels change over the 189 course of the mitotic cycles. String regulates MPF via a positive feedback loop, and has 190 been shown to peak at the seventh or eighth cycle of the mitotic divisions [36]. To 191 account for this, Calzone et al. [36] posited that String mRNA is degraded by a 192 hypothetical factor "X," causing the synthesis rate of String to drop over time.    [36]. The model in [36] was authored using BIOCHAM [42]. Our model reproductions that reproduce these plots are available as a COMBINE archive [43].
8 . CC-BY-ND 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/239004 doi: bioRxiv preprint first posted online Dec. 23, 2017; • Enable synthesis and degradation of String by setting the parameters ksstg=0.02 197 and kdstg=0.015 respectively.

198
• Set the initial concentration of total String to zero by setting StgPc=0.

199
• Compute the total amount of unphosphorylated String by adding the rule Tellurium makes it easy to encode both the original variant, without String synthesis 204 and degradation, and the variant including these terms in a COMBINE archive [43]. Fig 205  3 shows the results of executing this COMBINE archive in Tellurium. We have thus 206 expanded the dynamical test cases for this model, as it now reproduces two simulations 207 from two different variants described by the original authors (Fig 1 and 3   However, the model can be shown to exhibit limit cycle behavior by 1) removing all 216 discrete events and 2) fixing the number of divisions by introducing the variable C as a 217 cycle counter. The number of nuclear compartments is then given by N = 1.95 C (1.95 is 218 a scaling factor described in [36]). For a given cycle number C, MPF exhibits limit cycle 219 oscillations, although the amplitude and period of these oscillations changes with the  cycles (C = 12) but not at early cycles (C = 1). This observation suggests that String 234 and Wee dynamics are indeed crucially important for late cycle oscillations, but not for 235 early cycle oscillations, confirming the shift in regulatory mechanism. These simulation 236 thus form a third set of unit tests for the model, encoded as a COMBINE archive [44]. 237 In summary, using Tellurium's editing capabilities, we have created an extensive set 238 of unit tests for dynamical behavior this model, which we exported as a COMBINE    [36] that the number of mitotic divisions in the Drosophila embryo is governed by a shift from negative to positive feedback, we first removed all discrete events and introduced the variable C such that N = 1.95 C . We then compared the limit cycles produced by this eventless model (left) with those produced by a variant with attenuated positive feedback from the regulators Wee and String (right). Attenuation was achieved by decreasing the rates of the phosphorylation and dephosphorylation of Wee and String. The original model exhibits stable limit cycle oscillations for both early cycles (C), which are putatively dominated by negative feedback, and late cycles (E), which are putatively dominated by positive feedback. The attenuated model only exhibits stable oscillations at early cycles (D), suggesting that positive feedback does indeed play a role in late cycle oscillations (F). Our model reuse and modification study is available as a COMBINE archive that reproduces the figure shown and facilitates further modification and reuse [44].

10
. CC-BY-ND 4.0 International license peer-reviewed) is the author/funder. It is made available under a In order to address the requirement of broad standards compliance, we tested Tellurium 256 against a set of tests provided by the SED-ML Web Tools [22]. These tests utilize  expected trajectories encoded as a common-separated values (CSV) file, and graphical 265 plots for reference. We converted each of these 1196 test cases into COMBINE archives 266 containing the SBML models, SED-ML simulations, and CSV expected results and used 267 these COMBINE archives as a benchmark for Tellurium's support for standards. The 268 results of this benchmark are shown in Table S3.

270
In order for the conclusions of a research study to be valid, the models used in the study 271 must be reliable. Using SED-ML to reproduce the dynamics of a model and compare 272 these dynamics with expected values adds crucial value to the integrity and validity of 273 studies that reuse or expand on the model. As an exchangeable format, SED-ML is 274 confined to the intersection of the most common features available in dynamical 275 modeling tools, which leaves out certain useful types of analysis (e.g. bifurcation 276 analysis). However, we argue that the use case of SED-ML is not to serve as a 277 replacement for current analysis methods. Instead, SED-ML is a tool to test the 278 dynamical behavior of models before using them. For example, while we were not able to 279 reproduce the bifurcation analysis of the mitotic division study [36] in an exchangeable 280 format, we were able to verify the observations regarding the shift in regulatory 281 mechanism, and in doing so gained new insight from this alternative approach. A 282 researcher may also wish to verify that the model reproduces certain expected behaviors. 283 For example, if the model is expected to exhibit switch-like behavior, does this behavior 284 occur at the correct input threshold? For models with feedback, such as integral 285 feedback control [46], does the output exhibit robustness in the presence of 286 perturbations? These types of validation require expert knowledge of the system. While 287 there are tools and resources to help with this, the most important point for conveying 288 this information to other researchers is to encode it as transparently and lucidly as 289 possible, which is achieved using the literate notebook approach described here. Tellurium's approach of blending standards with literate coding enables researchers 291 to create rich, detailed workflows incorporating community standards. Tellurium allows 292 the models and simulations from these notebooks to be shared with other tools via 293 COMBINE archives. This allows other users to import these models and simulations 294 and reproduce them using independently developed software tools. This is consistent 295 with our original definition of reproducibility, as it enables robust cross-validation of 296 results between tools, as opposed to simply repeating a previous simulation. It also helps 297 ensure that the tools themselves are accurate and free of idiosyncrasies that could affect 298 the analysis results. Model repositories such as BioModels [47,48], JWS Online [49], and 299 the CellML model repository [50] have enabled widespread support for the SBML and 300 CellML standards. We believe that better tool support for SED-ML and COMBINE 301 archives will help create a trend toward better adoption of these formats by repositories. 302

311
Python-based tools such as PySCeS [61] and PySB [63] can be used with a Jupyter  Tellurium's Python foundation makes it easy to combine with other Python-based 317 software such as PySCeS, COBRApy [64], and PySB. There are also many specialized 318 Python packages for specific tasks such as moment closure approximation for stochastic 319 models [65], parameter estimation [66], Bayesian inference [67], and estimating rate laws 320 and their parameter values [68].

321
In biomedical research, certain tools have been created specifically to facilitate 322 reproducible research. One such tool is Galaxy [69]. Galaxy is a web-based tool which 323 allows users to create workflows describing experiments, e.g. metagenomic studies [70]. 324 A similar tool with a focus on web services and which supports SBML-based workflows 325 is Taverna [71]. Galaxy and Taverna allow users to annotate each step of the workflow, 326 which provides a way for others to follow and understand the chain of reasoning used in 327 the workflow's construction. This satisfies the requirement of transparency, as it allows 328 users to view the sequence of steps used to produce a result. Although this approach is 329 very different from a literate notebook in terms of the way the user interacts with the input data and a specific sequence of steps. VisTrails also saves all changes made to a 337 workflow and allows users to view previous versions, a concept termed "retrospective 338 provenance" [73]. However, this approach also lacks exchangeability. Furthermore, while 339 graphical tools may be more accessible because they abstract away the underlying 340 algorithms, it can be difficult to isolate and correct software errors when a step fails due 341 12 . CC-BY-ND 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/239004 doi: bioRxiv preprint first posted online Dec. 23, 2017; to bad input or an internal error.

342
Many other research software systems make use of notebooks, and some incorporate 343 special extensions. StochSS [74], the GenePattern Notebook [75], the SAGE math 344 system [76], and the commercial Mathematica software [25] all utilize notebooks which 345 are specially tailored or feature special extensions for each respective application. 346 However, none of these approaches attempt to solve the problem we address: workflow 347 integration with exchangeable standards. Our usage of the literate notebook approach is 348 intended to satisfy two specific requirements, which are distinct from other use cases: 1) 349 to make these standards easy for humans to read, understand, and modify, without 350 requiring expert knowledge of the technical specifications of the standards, and 2) 351 provide an integrated workflow which facilitates exchangeability with other software.

352
The notebook approach used by Tellurium also has disadvantages. For example, it is 353 difficult to use notebooks with a version control system in a meaningful way.

354
Furthermore, large or complex analyses can be difficult to orchestrate using notebooks, 355 as interacting with a large notebook with many cells can be cumbersome. Nevertheless, 356 we believe that Tellurium's approach is highly useful in many crucial use cases, In order to build larger, more complete, and more accurate dynamical models of cells 361 and tissues, it will be necessary to reuse models of subsystems. This is currently very  Tellurium integrates SBML, SED-ML, and COMBINE archives within a notebook 370 environment, making it exceptionally easy for users to work with these standards, and 371 obviating the need for users to understand the technical specifications of the standards. 372 The availability of authoring tools such as Tellurium will make it possible for model 373 repositories to begin implementing support for SED-ML and COMBINE archives.

374
Indeed, the JWS Online repository [49] already has support for exporting COMBINE 375 archives of models and simulations, which can be read by Tellurium. We hope that 376 other databases will follow suit so that it will be possible to automatically extract 377 dynamical information from these repositories.

378
Tellurium's human-readable representation of COMBINE archives is highly 379 important for facilitating model modification as we describe here. This feature enables 380 researchers to experiment with models using alternate parameterizations in order to test 381 the dynamical behavior of the models under varying conditions. We hope that this will 382 lead to more robust models which lead to biological insight by providing predictions 383 under a wide range of circumstances, as with the case studies presented here. There is a clear need to support exchangeability of simulation experiments in order to 386 allow researchers to build larger, better tested, and more comprehensive models. 387 standards. This allows Tellurium to support the widest possible range of software tools, 389 but also prevents exchanging studies not covered by SED-ML's vocabulary of 390 predefined simulation types. Due to delays associated with standardizing and 391 implementing features, SED-ML tends to lag several years behind other systems which 392 do not rely on standardization. Thus, SED-ML has the advantage of stable support 393 from a wide range of tools, but has the disadvantage of lacking the flexibility to encode 394 custom studies based on recent advancements in model simulation. domain-specific-language implemented using the Scala programming language. This 402 allows users to mix in Scala code to access features not yet available via SESSL's public 403 interface. However, this approach is not language-agnostic and is tied to Scala and its 404 low-level execution engine. The SED-ML standard, in contrast, does not constrain the 405 low-level operation of its implementations.

406
In this paper, we have argued for modelers to construct "unit tests" for dynamical 407 models by including model variants as in the study by Calzone et al. [36]. We have 408 shown that these variants are easy to construct and encode in COMBINE archives using 409 Tellurium, but we have not addressed how to validate these tests in an automated way. 410 Due to simulation algorithm differences between tools and the presence of multiple 411 steady states in some models, performing a direct numeric comparison between steady 412 state values or timecourse traces may be too fragile to be useful.

413
The BIOCHAM software tool [42] employs an interesting solution by using temporal 414 logic constructs to make assertions about properties of model timecourse dynamics.

415
Using this approach, it would be possible, for instance, to make semi-quantitative 416 assertions such as "species X exhibits oscillations with a period of 100 ± 50mHz".    Examples of Tellurium's inline OMEX format for specifying COMBINE archives. These examples start with very simple cases and build on these cases with progressively more advanced features. The first example contains a simple two-species model and simulated using a deterministic and stochastic solver. The second example shows a phase plot. The third example shows multiple stochastic traces. The fourth example shows a one-dimensional parameter scan. All of these examples are available via a Tellurium notebook, which can be accessed by clicking on "File" → "Open Example Notebook" → "COMBINE Archive Basics" from within the Tellurium notebook viewer.

19
. CC-BY-ND 4.0 International license peer-reviewed) is the author/funder. It is made available under a // Names model1 is "Repressilator-regular oscillations" model2 is "Damped oscillations" task1 is "Oscillation using a deterministic simulator" task2 is "Damped oscillations using a deterministic simulator" Fig S5. A normative example of a COMBINE archive introduced in the original paper describing the COMBINE archive format [21]. This example contains the repressilator model [79], a damped oscillation variant showing the modification of model parameters using SED-ML, and a phase plot of the undamped system. To verify exchangeability, we have manually tested importing these archives into our software and also into the SED-ML Web Tools. The SBML test cases were too numerous to test in this way, so a subset of archives were tested with the SED-ML Web Tools whereas the full set of archives was tested with Tellurium using a Tellurium notebook [82]. The SBML test suite was converted into COMBINE archives using the provided notebook https://github.com/0u812/tellurium-combine-archive-testcases/blob/master/sbml-test-suite/convert-to-combine-arch.ipynb. These SBML test cases are automatically converted into COMBINE archives containing the expected results, which are then converted by Tellurium into inline OMEX and simulated. A "failed" test refers to a case where the numeric simulation results diverge from the expected values. An "unsupported" test refers to a test that uses features not available in our simulator (libroadrunner) or the inline OMEX strings.

22
. CC-BY-ND 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/239004 doi: bioRxiv preprint first posted online Dec. 23, 2017; The SBML test suite was converted into COMBINE archives using the provided notebook https://github.com/0u812/tellurium-combine-archive-testcases/blob/master/sbml-test-suite/convert-to-combine-arch.ipynb. These SBML test cases are automatically converted into COMBINE archives containing the expected results, which are then converted by Tellurium into inline OMEX and simulated. A "failed" test refers to a case where the numeric simulation results diverge from the expected values. An "unsupported" test refers to a test that uses features not available in our simulator (libroadrunner) or the inline OMEX strings.

23
. CC-BY-ND 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/239004 doi: bioRxiv preprint first posted online Dec. 23, 2017;  [22]. In order to demonstrate broad support for standards, we conducted a series of tests utilizing advanced usage of SED-ML. The first row shows the original example rendered in the SED-ML Web Tools. The second row shows the same example imported into Tellurium. The third row shows the simulation after editing the model in Tellurium. Finally, the fourth row shows the result of re-exporting the example to the SED-ML Web Tools using a COMBINE archive.

24
. CC-BY-ND 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/239004 doi: bioRxiv preprint first posted online Dec. 23, 2017; 81. Scharm M, Waltemath D. A fully featured COMBINE archive of a simulation study on syncytial mitotic cycles in Drosophila embryos. F1000Research. 2016;.

30
. CC-BY-ND 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/239004 doi: bioRxiv preprint first posted online Dec. 23, 2017;