Crop2ML: An open-source multi-language modeling framework for the exchange and reuse of crop model components

50 Process-based crop models are popular tools to analyze and simulate the response of agricultural 51 systems to weather, agronomic, or genetic factors. They are often developed in modeling platforms 52 to ensure their future extension and to couple different crop models with a soil model and a crop 53 management event scheduler. The intercomparison and improvement of crop simulation models is 54 difficult due to the lack of efficient methods for exchanging biophysical processes between modeling 55 platforms. We developed Crop2ML, a modeling framework that enables the description and the 56 assembly of crop model components independently of the formalism of modeling platforms and the 57 exchange of components between platforms. Crop2ML is based on a declarative architecture of 58 modular model representation to describe the biophysical processes and their transformation to 59 model components that conform to crop modeling platforms. Here, we present Crop2ML framework 60 and describe the mechanisms of import and export between Crop2ML and modeling platforms. 61 62


Introduction
The wide range of crop process-based models (PBM) reflects the evolution of our knowledge of the soil-plant-atmosphere system and the rich historical development for more than five decades (reviewed in Jones et al. 2017;Muller and Martre 2019).The high diversity of PBM is due to their multiple applications and the complexity of the system influenced by several factors, e.g.weather, soil, crop management (Basso et al., 2013) and genotypic factors (Wang et al., 2019).Most of the PBM are continuous models, formalized using ordinary differential equations, but are implemented as discrete time simulation models using finite difference equations.They are commonly decomposed into simpler biophysical functions (e.g.phenology, morphogenesis, resource acquisition, pests and diseases impact) often implemented by recurrent equations with control flows.
Another common characteristic is that PBM simulate plant growth and development at the scale of the canopy or average plant level without spatial dependence with a daily or sub-daily time step.
PBM are often implemented in modeling and simulation platforms at a higher level of abstraction to facilitate model development (Rizzoli et al., 2008) .These platforms offer not only scalable, modular, and robust modelling solutions but also the ability to analyze, evaluate, reuse and combine models.The diversity of PBM led the crop modeling community to compare their performance and to improve them by aggregating modelers' knowledge or by introducing improvements provided from diverse research groups under the umbrella of large international collaborative projects such as the Agricultural Model Intercomparison and Improvement Project (AgMIP; Rosenzweig et al. 2013).
Studies conducted in the context of model intercomparison and improvement exercises (e.g.Asseng et al. 2013;Wang et al. 2017) pointed out the large uncertainty of PBM simulations and have analyzed the sources of uncertainty or the processes involved.These intercomparison results showed the potential and limits of PBM and highlighted the need to analyze models at the process level, but also to exchange model components describing specific processes between simulation platforms (e.g.Donatelli et al. 2014;Wang et al. 2017).The uncertainty of a PBM component may be related to its validity domain, inputs, parameters, structure, and the underlying scientific hypotheses (Walker et al., 2003).Epistemic uncertainty may arise from incomplete or lack of knowledge of these sources.
The uncertainty of PBM results from the aggregation of the uncertainty of each of its component (Refsgaard et al., 2007).A framework that would allow the exchange of model components between different platforms would give crop modelers the ability to test alternative hypotheses in the same model, thus helping to reduce epistemic uncertainty.
Although most crop simulation platforms provide modular approaches and reuse techniques, there is little exchange of PBM components between them despite theoretical and application interests.PBM components often contain source code developed in different programming J o u r n a l P r e -p r o o f languages and are tightly coupled to the platforms.Therefore, model components are not seamlessly reusable outside the modeling platforms in which they have been developed without recoding or wrapping them (Holzworth et al., 2014;Rizzoli et al., 2008).Re-implementing a component in several platforms is a tedious and cumbersome task and requires a minimum knowledge of the different platforms.The wrapping solution treats components as black boxes taking little or no advantage of the framework (Rizzoli et al., 2008) or as white boxes but with a high-level of complexity (Fernique & Pradal, 2018;Pradal et al., 2008).Other reuse approaches in environmental modeling have been explored.Declarative modeling can provide portability and facilitate integration between independent, uncoordinated models (Athanasiadis and Villa (2013).However, model specifications are seldom separate from implementation details.Model builders rely often directly on implementation that hides the scientific content of a model (i.e. its algorithm) and its structure.
Moreover, the publication of PBM components in scientific journals does not provide sufficient description associated with the modeled processes, which is a fundamental criterion for reuse (Pradal et al., 2013).This raises the problem of reproducibility and reliability of scientific results that are strongly linked to the platforms in which the models have been implemented and tested (Cohen-Boulakia et al., 2017;Hinsen, 2016).
Visual domain-specific languages such as Simile (Muetzelfeldt and Massheder 2003) or Stella (Richmond, 1985) provide a rich graphical interface to build models but become difficult to use for complex models and require many widgets to represent graphically nested control flows.Multiscale modelling and simulation frameworks (Marshall-Colon et al., 2017;Pradal et al., 2015) propose model interface designs which enables communication of multi-language components as black box components.Other declarative modelling languages are also used in the Systems Biology community who have developed declarative open standard such as SBML (Hucka et al., 2010), CELLML (Cuellar et al., 2003), or NEUROML (Le Franc et al., 2012) to describe biological models.However, crop modelers generally use procedural modelling rather than a mathematical formalism like differential or reaction equations as it is commonly done in System biology.
An alternative to the problem of PBM component reuse between PBM platforms is the use of a centralized framework that enables the development of PBM components regardless of the modeling platforms (Fig. 1).We followed this approach and developed a modeling framework called Crop2ML (Crop Modelling Meta Language) that separates the structure of a model component from its implementation.Given that the wrapping solution was excluded because of the lack of transparency and high maintenance cost and that Crop2ML does not aim at replacing existing modeling platforms or at simulating components within large modeling solutions (crop models), we created a solution that generates components, from a metalanguage, for specific PBM platforms.It provides a centralized PBM components repository to store model components in a standard format J o u r n a l P r e -p r o o f 6 to facilitate their access and reuse.This reuse approach is supported by the Agricultural Modeling Exchange Initiative (AMEI), which brings together some of the most widely used crop modelling and simulation platforms, including the Agricultural Production Systems sIMulator (APSIM, Holzworth et al., 2018), the Biophysical Model Applications (BioMA; Donatelli et al., 2010), the Decision Support System for Agrotechnology Transfer (DSSAT; Jones et al., 2003;Hoogenboom et al., 2019), OpenAlea (Pradal et al., 2015), the REnovation and COORDination of agroecosystems modelling (RECORD; Bergez et al., 2013), and the Scientific Impact assessment and Modeling Platform for Advanced Crop and Ecosystem management (Simplace; Gaiser et al., 2013) and other crop models such as STICS (Brisson et al., 2010) or SiriusQuality (Martre et al., 2006).Here, we first present the main components of Crop2ML framework.Then we describe the mechanisms of importing and exporting between Crop2ML and PBM platforms.We then discuss our approach and present some perspectives.

Crop2ML: a centralized framework for crop model components development and sharing
Crop2ML is a framework for crop model component development, exchange, and reuse between PBM platforms.It is designed following FAIR principles for research software (Lamprecht et al., 2019) to provide:  Simplicity: Model specifications are defined using a declarative language (eXtensible Markup Language [XML]; Bray et al., 2008) with generic concepts shared between PBM platforms and model algorithms are encoded using a minimal language.
 Transparency: Models are shared as documented components in a well-defined format (Crop2ML format).
 Flexibility: Model units are composed with a shared abstract representation of model structure. Modularity: Three levels of modularity of models are defined: (single) model units, composite models and package.Package contains model units and composite as well as data.It provides the flexibility to make different compositions based on these models.We used the principles of Lamprecht et al. (2019) for assessing the FAIR-ness of Crop2ML framework (Supplementary data Table C1).

Design and concepts of Crop2ML model specification
Software modularity is one of the main criteria of reuse.Jones et al., (2001) proposed key elements for modular model structure, which is an essential first step to enhance collaborative modelling effort.Crop2ML follows and extends these principals.In most PBM, the system is decomposed into compartments such as plant parts or soil layers that interact.For each compartment, different processes are described and assembled in components to simulate the response of the compartment.These processes can be subdivided into discrete, explanatory, independent biophysical sub-processes, which could be individually modeled (ModelUnit) or composed (ModelComposite).A modular model structure requires making an objective decomposition of the system to avoid coarse granularity models, which limit reusability.A ModelUnit should not encapsulate alternative assumptions and formalisms, making it easier to test them.In addition, the management of input and output data, such as data access, logging, and file generation, must be managed separately from the implementation of model component.These design principles foster the reuse of components, which are intended to be integrated and simulated with a large variety of input data formats in different PBM platforms.Moreover, to emphasis modularity, the temporal integration loop must be removed from the model process implementation.This makes it possible to reuse the same process with different modeling formalisms or simulation frameworks that manage temporal dynamics of the simulation differently (e.g.different numerical integration techniques).
Crop2ML provides a level of abstraction that enables a shared representation of model components between PBM platforms.A ModelUnit is defined with the following descriptive elements (Fig. 2a):  a model description; J o u r n a l P r e -p r o o f  a list of inputs;  a list of outputs;  an initialization step of the state variables;  a link pointing to the source of the model algorithm;  a list of usual mathematical functions;  a set of unit tests with parameterization shared between modeling platforms.A ModelComposite includes the same elements as a ModelUnit.In addition, it contains a list of Models and the links between them.(Fig. 2b).However, if control structures are necessary to express the behavior of a ModelComposite, the Algorithm can be explicitly provided.
The Crop2ML model specification is based on XML Language.XML is a widely used declarative metalanguage for describing or structuring data in a portable format with some descriptive elements.
XML format is used in several PBM platforms for template parametrization and model simulation configuration (e.g.APSIM, BioMA, RECORD, Simplace, SiriusQuality).This reinforces our choice on this format since the transformation between different XML documents or in any language is relatively straightforward, allows using XML as a bridge between heterogeneous structures and it facilitates collaborative development.Moreover, the use of XML and a formal description of model specifications and their associated metadata facilitate machine readability and model exchange.In the following sections, we describe the concepts of Crop2ML model specifications.

Description
J o u r n a l P r e -p r o o f The core description of a Crop2ML model contains the name of the model, an identifier that ensures the provenance of the model and a version number (Fig. 3).The identifier of the model is specified to keep the property of the component.Since PBM are dynamic models, the time step is an important factor that is specified to allow a multi temporal-scale composition.In addition, other elements are described to provide rich metadata, including author names and affiliations, citable and findable references (e.g.doi) and a brief description of the model.The description also includes usage licenses compatible with the model dependencies.

Inputs -Outputs
In Crop2ML, a component takes parameter and variable values as inputs and produces variable values as outputs.A variable is a quantity which is given by the context of the experiment (input data) or calculated by the model (output data), while the value of a parameter is an input that can be specified by the modeler within a defined interval.Variables and parameters are distinguished with input type attributes and are categorized with variable category and parameter category attributes, respectively (Table 1).(Donatelli & Rizzoli, 2008).It also provides a common representation of date/time.The domain of validity of each variable is specified by min and max attributes.A measurement unit can also be associated to the variables and parameters.Fig. 4 gives an example of inputs and outputs specifications.J o u r n a l P r e -p r o o f 14

CyML: the common modelling language of biophysical processes in crop models
We defined a set of common features resulting from the intersection of the programming languages supported by PBM platforms to propose a shared modelling language.A design choice was to define a subset of an existing language that can provide these common features.We needed a widely used high-level language with a low learning curve so that modelers with basic programming skills could efficiently use it.The transformation of a language with dynamic typing can make code transformation into programming languages with static typing ambiguous.Therefore, we choose Cython, a high-level language that combines the expressive power of Python language with explicit type declaration of C language (Behnel et al., 2011).It is compiled directly in efficient C code, which improves runtime speed and makes it possible to interact with C, C++ and Fortran source code.
However, not all Cython syntax can be directly transformed in all target languages.For instance, the yield statement and anonymous functions are not supported by Fortran.Therefore, we defined CyML (Cython Meta Language), a sub-set of Cython to address the implementation of the model algorithm (Midingoyi et al., 2020).
We use CyML as a pivot language between various platform languages, which can be mapped to their syntax and semantics.The structure and syntax of CyML, as well as its transformation system to various languages and platforms is detailed in Midingoyi et al., (2020).In brief, CyML supports datatypes defined in the model specification and provides standard mathematical functions and operators.In addition to local variable declaration and assignment statements, control structures are used in the flow of instructions described by the encoded algorithms.These include conditional statements (if, elif and else) to check if a condition is satisfied before addressing part of an algorithm, sequential statement (for loop) with an incremental index on a data collection, and a repetitive statement (while) used to repeat part of an algorithm while a condition is satisfied.These structures can be nested.To support modular designs and the reuse of ModelUnits and functions, CyML provides import mechanisms, which assumes that imported ModelUnits or functions are referenced.
Crop2ML framework provides a source-to-source transformation system (CyMLT) which converts CyML source code into procedural (Fortran, Python, C++), object-oriented (Java, C#, C++, Python) and scripting or functional (R, Python) languages (Midingoyi et al., 2020).CyMLT implementation relies on the transformation of the abstract syntax tree (AST) generated from the syntax analysis of the CyML code.The AST is transformed to a self-contained representation of the source code called Abstract Semantic Graph, which is independent of the source language.CyMLT proposes a unique approach to transform the Abstract Semantic Graph into readable source code in many different languages.The generated code is independent from the transformation system and can be run

Crop2ML model package
In the context of large projects and collaborative work, it is useful to define some requirements or standards to facilitate common exchange.Crop2ML provides a logical, standardized but flexible support to facilitate model sharing between modeling platforms through the definition of a directory structure (Fig. 8).This template includes a folder that contains model description and associated algorithms, a repository of source code for each language and modeling platforms.It also includes a folder containing input data for a ModelComposite simulation, and a folder containing the unit tests.
To save time and avoid omission of mandatory files or folders during package creation, we created a cookiecutter (Roy, 2017)

Model validation
Crop2ML model components can be validated by executing unit tests.It consists of using the parameter and variable values from the model specification to produce unit tests in different languages.Unit tests are generated in Jupyter notebook format, a document format for publishing source codes and reproducible computational workflows that could be executed in the appropriate kernel in Crop2ML software environment.This format is useful for code and documentation publishing and real-time collaboration when running on a remote server (Kluyver et al., 2016).Unit tests may also be associated with a model publication.

Model transformation
The success of Crop2ML model reuse through a white box approach comes from its ability to

Interoperability between various simulation platforms
The interoperability between simulation platforms is based on two transformation processes In DSSAT and Record the export process is many automatic but some aspects need to be done manually.In DSSAT, Crop2ML transformation system generates a submodule in Fortran 90 for each ModelUnit.It also generates a sequence of submodules calls for composite models.One issue that makes this transformation not completely automatic is that Crop2ML does not manage the handling of input and output files.Therefore, it requires to manually add the input and output methods into the generated submodules.The concepts of atomic and coupled models in Record are mapped with those of Crop2ML.Thus, atomic model classes are generated in C++ to correspond to ModelUnits.
However, the configuration and simulation file (VPZ) representing the ModelComposite is manually completed with further information such as the description of simulation result files.
The import process (from simulation platforms to Crop2ML) is only partially automatic.Platform tools produce automatically the meta-information in Crop2ML format but algorithms are manually converted into the CyML language that leads to a semi-automatic transformation.A complete automatic transformation would require the implementation of source-to-source transformation from platforms' language into CyML.ModelUnits meta-information.The process method (algorithm) is currently translated manually in CyML.Links between the different SimComponents (Unit) stored in the SimComponentGroup (Composition) are automatically exported to the Crop2ML structure.However, there is a loss of information since when a ModelUnit is activated or ignored it is not transferred to the Crop2ML structure.In DSSAT, unlike in the other platforms, the description of physiological processes is provided as documentation in submodules and it is not fully complete with respect to Crop2ML specifications.Inputs and outputs variables and their descriptions, units can be clearly identified, based on systematic platform guidelines.DSSAT submodules contain specific platform variables such as control variables that need to be removed to produce CyML model algorithms.In Record, as in DSSAT, there is no explicit specification of a model.The documentation of a model within its associated C++ class is used to generate partial ModelUnit meta-information.The parsing of the VPZ file, that contains the structure of composite models in Record, is used to generate a ModelComposite.However, it is not possible to represent retroaction loops in Crop2ML as it is done in Record with coupled models.In order to illustrate Crop2ML concepts and transformation results, a phenology and an energy balance models are used.Phenology, the timing of crop development is the heart of most crop J o u r n a l P r e -p r o o f growth models and is an essential component of most crop modeling platforms.The energy balance model involves interconnected components that allows estimating canopy temperature, evapotranspiration, and heat transfer between the canopy and the air.These processes are implemented as BioMA standalone components (Manceau & Martre, 2018) of the wheat PBM SiriusQuality (He et al., 2012;Martre et al., 2006).The two components were converted into Crop2ML packages, and then automatically translated into different languages and model components that conform to different PBM platforms.These packages are presented in Appendixes A and B. In Table 3 we illustrate how to represent a parameter and an algorithm in a Crop2ML Model Unit and its translation with CyMLT in Record, BioMA, and DSSAT.The implementations of the model differ between the platforms.For instance, DSSAT defines a subroutine with all the variables as argument, Record defines a class method (compute) with the variables as attributes of the class and uses specific operator "()" to manage temporal variables, while BioMA defines a class method (CalculateModel) that takes as argument data structures implementing each category of variables (state, rate, auxiliary, exogenous).The aim is to provide to the platforms alternative model components that could easily replace their corresponding components to analyze the effects of new hypotheses into their modeling solutions.
The sequence of ModelUnits that compose a Crop2ML ModelComposite is formally modeled as a directed acyclic graph.This means that there is no feedback loop or retroaction at a given time step, instead they are usually represented by a cycle in the ModelComposite.Alternatively, a state variable can be defined explicitly as two variables with respect to the current and the previous time.Thus, a composite model may take as input a state variable at previous time and a state variable at current time as output, making implicitly a loop with respect to time advance.Another way to represent feedback inside a time step is to associate an explicit algorithm to the ModelComposite that defines how to run it.However, this feature is not supported by two simulation platforms (OpenAlea and RECORD).

Discussion
The Crop2ML framework enables a user to exchange and reuse biophysical components between various PBM platforms through shared declarative specifications.The use of a minimal language to describe the model algorithm once and the transformation system facilitates reuse of models' components.ModelUnits and ModelComposite can be accessed and composed following a white box approach.Therefore, the Crop2ML approach greatly increases the ability of modelers to share their algorithms.The protocol will allow modelers to borrow components easily and will facilitate their intercomparison and improvement in different PBM platforms.

How does Crop2ML address model reuse compared to other initiatives?
Some initiatives addressed model reuse by providing multi-scale and multi-language integrative frameworks such as Crops in silico (Marshall-Colon et al., 2017) the Open Modeling Foundation OpenMI (Buahin & Horsburgh, 2018).These frameworks can compose and simulate heterogeneous  2006) is similar to ours but it is limited to models where the dynamics of the modeled processes is represented by simple mathematical expressions without control structures, which does not match crop modeling context.Hucka et al. (2003) used MathML (Ausbrooks et al., 2003) to express interactions between variables through mathematical formalisms well defined in the systems biology community.This approach is similar to that of Rizzoli et al. (2008) and is useful when processes are governed by differential equations.
However, in the PBM context, simulation platforms use algorithms to describe processes rather than mathematical formalisms with differential equations.Moreover, in PBM, variables that drive the system are temporal series that change the behavior of the system at discrete time.This does not require finding a general solution of recurrent equations used in crop models but rather estimating at each time step the state variables of the system.
Automated model transformation is a core aspect of model-driven development (Cuadrado & Molina, 2007).It uses Model-Driven Engineering (MDE) principles based on metamodeling concepts.
Crop2ML is in line with MDE.It defines structured concepts representing its metamodel, with which all Crop2ML models are conform, and a model transformation to generate PBM platforms' components.Model Driven Architecture (Brown, 2004) is a framework of MDE that provides several standard languages (e.g., ATL, QVT, ETL, Henshin, VIATRA, and Stratego) for model transformation (Jouault and Kurtev, 2006;Kurtev et al., 2006).Crop2ML is based on a transformation process through a set of refinement of models and code with some extensible rules defined as templates in Python.Most MDE approaches allow model to model or model to code transformation where a model represents the specification in our case.However, the use of transformation language standards was inappropriate in our context to unify transformation process towards many languages with different paradigms (Bucchiarone et al., 2020).Crop2ML produces code in a target language but also adapts the code to fit with PBM platform specificities.To our knowledge, model transformation languages in MDE do not support code generation in multiple languages with extended features in the same environment.

Connecting Crop2ML to PBM platforms
Given that Crop2ML datatypes do not handle complex data structures other than arrays and lists, some compromises or transformation should be made to the import-export process on the platform side with respect to handling other data structures used in platforms.The Crop2ML transformation system is designed to support the specificities of the target PBM platforms.However, the semantic of a Crop2ML model is based on shared concepts to describe at a high level a biophysical process by a discrete-time model.There is no semantic reason to support the description of each instance of the concepts.For example, since we have not defined a convention to name process variables, the integration of a Crop2ML component into a PBM modeling solution requires adapting the name of its variables.In the future, we could annotate Crop2ML models to add semantic information to make semantic links between any Crop2ML model variables or parameters with those of model components of PBM platforms.This will also allow a semantic composability of Crop2ML models instead of a syntactic composability that analyzes whether the pair of variables to be linked are compatible.However, this would require the crop modeling community to agree on shared semantics and ontologies of crop model variables and parameter representations.Until now this has been a real challenge as the crop modeling community has not be too keen on adapting standards (White et al., 2013).In addition, to facilitate the exchange and reuse of model components, semantic descriptions of model variables and parameters would facilitate the linking of crop models to plant phenomics data (Neveu et al., 2018).
We were able to achieve fully-automatic export of Crop2ML model to several PBM platforms.
The import process into Crop2ML is more mixed regarding the overall differences between PBM platforms.It is much easier to start with concepts shared and reused by PBM platforms than to start from divergent views of model representations to achieve a particular result.Some PBM platforms need to extend their concepts for model specification or to provide a rich model documentation in order to produce complete Crop2ML model specifications.This reveals the need of a good level of abstraction to represent a model in various PBM platforms.The higher the level of abstraction, the further the description moves away from the platforms and the less easy it is to understand.On the other hand, if the level of abstraction is too low, it is not always possible to represent all features of the models present in the platforms.
J o u r n a l P r e -p r o o f 4

Future developments
A common model repository infrastructure is essential for efficient model exchange (Glont et al., 2018;Lloyd et al., 2008).Currently, Crop2ML model components are stored in Github repositories.
We aim to provide a Crop2ML model repository to store models in a shared format to make them easily accessible and reusable by the plant and crop modeling community.This repository should aim at hosting alternative biophysical processes.It will help modelers to operate on multiple model components, compare processes, or evaluate the impact of the integration of alternative models of biophysical processes in crop models.The success of the Crop2ML repository requires that the community gives access to their models by feeding the repository, which will be curated by the AMEI consortium to avoid error propagation.
Crop2ML has some limitations, which can be addressed in the next versions, either by extending the model specifications with shared concepts or by adapting the target PBM platforms to Crop2ML specification and language.It is an ongoing, long-term activity, to satisfy platform requirements and facilitate Crop2ML model life-cycle management to make Crop2ML a standard for the plant and crop modeling community.
The transformation of a model component of a PBM platform into a Crop2ML package requires rewriting the model algorithms in the CyML language.This limit is currently being addressed by extending the CyML transpiler to a bidirectional transpiler.Thereby, PBM platforms could provide model algorithms in the language they use and the extended CyMLT will transform them in CyML and target languages used by other PBM platforms (Fig. 11).This is a two-step process.First, the model algorithms in the language of the source PBM platform will be parsed and an AST will be generated.
Second, the rules for transforming this AST into the CyML AST will be applied.The second step will reuse the CyML transformation tool developed by Midingoyi et al. (2020) to produce model algorithms compatible with other languages and PBM platforms.
J o u r n a l P r e -p r o o f Other future developments of Crop2ML include: • Enhance Crop2ML model repositories with model annotation to link publications to models for reproducibility; • Add unit checks and conversions in Crop2ML to improve model validity; • Define a methodology to link Crop2ML with plant structure representation for multiscale viewing and analysis; • Define and implement an ontology of crop model variable and parameters to allow better Crop2ML model interpretation and improve transformation between PBM platforms and the integration of model component in complex modeling solutions.
• Extend Crop2MLab prototype by including bidirectional transformation and the creation of a web interface on a remote server in order to give users the possibility to handle Crop2ML model lifecycle without local installation.

Conclusion
At the interface between modeling and software engineering, this paper addresses plant and

Fig. 1 .
Fig. 1.From a combinatorial to a centralized exchange framework.The schema illustrates the reduction of import export links between platforms in a centralized (right) versus combinatorial exchange framework.
J o u r n a l P r e -p r o o f  Findability: Model specifications include rich metadata and are assigned a globally unique and persistent identifier for each released version. Reusability: Model components are transformed into PBM platform-compliant code to support efficient interoperability. Reproducibility: Model components can be executed and tested regardless of the PBM platforms.

Fig. 4 .
Fig. 4. Example of input and output specifications of a Crop2ML model.

Fig. 5 .
Fig. 5. Example of a link to an algorithm file.

J
o u r n a l P r e -p r o o f outside the Crop2ML framework.The transformation system integrates model documentation based on the model specification into generated code.
can be extracted as a stand-alone model from an existing package, tested, reused, or integrated in other ModelComposite or package.The notion of package-dependency increases the modularity of Crop2ML and avoids model duplicity.

Fig. 8 .
Fig. 8. Tree view of the structure of a Crop2ML model component package.

Fig. 9 .
Fig. 9. Visualization of energy balance ModelComposite provided from SiriusQuality wheat model developed with the BioMA platform.Ellipses are ModelUnits and arrows represent the link between two ModelUnits

Fig. 10 .
Fig. 10.Schematic representation of the Crop2ML framework showing Crop2ML model lifecycle from the creation of a package to model transformation.

(
import and export) via Crop2ML.The import process consists of transforming any platform model component to Crop2ML model.The export process consists of transforming Crop2ML models to any platform.Detailed descriptions of the import/export mechanisms in five widely used platforms with different architectures (BioMA, DSSAT, Record, OpenAlea, SIMPLACE) are provided in Supplementary data (Appendix C).Table2summarizes the interoperability of model components between these platforms.Platforms are based on various programming languages, which requires the definition of transformation rules between CyML and various languages including C# (BioMA), Java (Simplace), C++ (Record), Python (OpenAlea) and Fortran (DSSAT) in both directions.We identified the levels of granularity of modeling processes that correspond to Crop2ML concepts such as ModelUnit and ModelComposite in each platform.We also considered how documentation or model specifications are described in these platforms.The export process, from Crop2ML to platforms, is automatically done in BioMA, OpenAlea and Simplace.The modularity principle in BioMA matches Crop2ML, which allows associating simple and J o u r n a l P r e -p r o o f composite BioMA strategies with Crop2ML ModelUnit and ModelComposite, respectively.Moreover, all the Crop2ML elements are well translated into the VarInfo type attributes(Donatelli & Rizzoli,    2008), and Crop2ML model algorithms are transformed to a method of a strategy class that takes generated domain classes as inputs.OpenAlea relies on two families of approaches: componentbased architecture and scientific workflows.Thus, Crop2ML exports ModelUnits as OpenAlea components and ModelComposite as OpenAlea workflows.ModelComposite can thus be visualized and edited using VisuAlea, the visual programming environment in OpenAlea.Widgets of ModelUnit are automatically generated based on the type of inputs that is mapped to an OpenAlea interface.Simplace is based on the concept of software units, called SimComponents as the smallest building blocks that map with ModelUnits.ModelComposite are converted into a combination of SimComponents (SimComponentGroup).Variables and parameters descriptions are automatically included in the SimComponents descriptive part.

J
o u r n a l P r e -p r o o f Table 3. Declaration of the inputs and algorithm of a Crop2ML ModelUnit of the Penman-Monteith evapotranspiration model and the equivalent source code generated by CyMLT for Record, BioMA, and DSSAT.The declaration of a single variable is given as an example.r n a l P r e -p r o o f DSSAT / Fortran J o u r n a l P r e -p r o o f 1 models provided by different frameworks through a communication interface.The model components are often wrapped and are represented as black-box components.All state variables are not always exposed as model outputs, which may limit their integration in an existing modeling solution.Therefore, these frameworks enhance model reuse in their own environment but they do not address reusability with other PBM platforms.Many existing PBM platforms do not support the coupling of models written in multiple languages (e.g.BioMA, APSIM next generation).Donatelli and Rizzoli (2008) proposed a design pattern for platform-independent model components to enhance modularity and to facilitate model reuse in several PBM platforms via simple wrappers.However, this approach fixes the structure of the components.The lack of specification or meta-information makes the reuse of model components between platforms difficult.Even in component-based systems, explicit information about the component itself and its inputs and outputs (types, units and boundary conditions) are required to ensure a syntactic composability and to meet the specificities of the platforms.Moreover, the knowledge of the structure underlying the source code of a component is also required to systematically extract model information (variables and algorithms) for their transformation and integration in different platforms.We thus argue that model component reuse is improved if it is supported by model specification.Crop2ML defines an abstract representation of model design shared by PBM platforms through some shared concepts enriching or extending those proposed byAthanasiadis et al. (2011) with other attributes and a formal and shared description of unit tests.We included unit tests in Crop2ML specifications to ensure model transformation validation and some imperative constructs for model dynamics.J o u r n a l P r e -p r o o f 2 Several initiatives have used declarative modeling to describe model specifications and address model reuse issues.The approach proposed by Villa et al. (
crop model component reuse by proposing the Crop2ML framework.Despite all the differences between PBM platforms, some common features can be identified that enabled model representation regardless of the platforms' specificities.Crop2ML provides structured concepts to support the definition of ModelUnit and ModelComposite and allows their transformation to make them compatible with PBM platforms at implementation level.Therefore, Crop2ML defines a new J o u r n a l P r e -p r o o f

Table 1
Category, definition, and example of variables and parameters in Crop2ML.Crop2ML currently supports four basic types: integer, double, strings and logical.It also supports two collection types: lists and arrays, which contain a sequence of elements of basic types.They are explicitly specified in a datatype attribute, similar to the VarInfo type Leaf area index, weight of a plant part, canopy temperature Rate Defines the change of one state variable Transpiration rate, leaf growth rate Auxiliary Intermediate variable computed by an auxiliary Dry matter partitioning, shoot number J o u r n a l P r e -p r o o f In BioMA, VarInfo attributes are extracted from BioMA strategies to produce Crop2ML model meta-information.The process of automatically retrieving the .Like BioMA, the SimComponent specific descriptors in Simplace allows generating parameters.Thus, the generation of model description in Crop2ML is partial.It requires further description of components that can be provided in documentation or by extending OpenAlea J o u r n a l P r e -p r o o f concepts

Table 2
Import and Export processes between Crop2ML and PBM platforms.A, automatic; P, partially automatic; M, manual.
, Dictionaries can be expressed in Crop2ML as two list datatype variables that represent keys and values of the dictionary.The simulation algorithm defining the feedback loop is explicitly described as control flow in some platforms (e.g.BioMA) but this is not the case in other platforms (e.g.Record, where the VPZ file representing the simulation model file is handled by the simulation engine VLE).Different simulation engines are based on different models of computation used by the platforms such as dataflow (e.g.OpenAlea), DEVS simulation (e.g.Record), control flow (e.g.BioMA, DSSAT, and Simplace).These models of computation are used to coordinate the execution of the model.The current version of Crop2ML framework does not take into account the specificities of simulation engines and addresses components which can be sequentially composed.
As an example, BioMA provides the Dictionary data type that is a set of keys associated with values to represent either input or output variables.This data type is not handled by Crop2ML, and by most PBM platforms.As an J o u r n a l P r e -p r o o f alternative