Computers and Electronics in Agriculture

The data deluge following the rise of Internet of Things contributes towards the creation of non-reusable data silos. Especially in the environmental sciences domain, syntactic and semantic heterogeneity hinders data re-usability as most times manual labour and domain expertise is required. Both the di ﬀ erent syntaxes under which environmental timeseries are formatted and the implicit semantics which are used to describe them contribute to this end. Usually, the real meaning of data is obscured in a combination of short data labels, titles and various value codes, that require domain or institutional knowledge to decipher. The FAIR data principles for scienti ﬁ c data sharing are stewardship o ﬀ er a framework based on community-adopted metadata. In this work, we present the Environmental Data Acquisition Module (EDAM) which focuses on data interoperability and reuse, and deals with syntactic and semantic heterogeneity using a template approach. Data curators draft templates to describe in an abstract fashion the syntax of the timeseries datasets they want to acquire or disseminate. They complement each template with a metadata ﬁ le, which is used to annotate observables and their properties (including physical quantities and units of measurement) with terms from an ontology. EDAM employs a reasoner to infer compatibility among syntactically and semantically heterogeneous datasets, and enables timeseries, format and units of measurement transformation on-the-ﬂ y. Our approach utilizes a local ontology to store metadata about datasets, which enables EDAM to acquire and transform datasets which were originally stored with di ﬀ erent semantics and syntaxes. We demonstrate EDAM in a case study where we transform meteorological input ﬁ les of four agricultural models. Our approach, allows to cut across environmental data silos and facilitate timeseries reusability, as it enables users to (a) discover datasets in other formats, (b) transform them and (c) reuse them in their scienti ﬁ c work ﬂ ows. This directly contributes to the toolshed for FAIR data management in environmental sciences. EDAM implementation has been released under an open-source license.


Introduction
Scientists and environmental practitioners nowadays are confronted with the vast array of legacy environmental datasets, that become available online, and also with new data produced via the Internet of Things (IoT) devices. Raw data must undergo certain modifications in order for new knowledge to be produced by environmental models (Rizzoli et al., 2007). However, transforming a dataset to be compatible with a certain data specification is a laborious process (Horsburgh et al., 2009) and usually requires a human expert intervention (Athanasiadis, 2015). This process hinders environmental data reusability (Ames et al., 2012), facilitates the formation of data silos (Terrizzano et al., 2015) and ultimately widens the data-to-knowledge gap (Elag et al., 2017).
Semantic heterogeneity among the legacy datasets hinders automatic data transformation. The interdisciplinary nature of environmental sciences impedes reusability which is essential in the era of (big) data (Rizzoli et al., 2007;Wilkinson et al., 2016). Environmental timeseries are typically curated by several organizations and are annotated with implicit semantics (Beran and Piasecki, 2009). The real meaning of the data is obscured in a combination of short data labels or titles combined with institutional knowledge . Such implicit semantics concern the physical quantity that was measured (i.e. temperature); the units which were used (i.e. Celsius degrees), and the physical phenomenon (entity or process) that it was measured on (i.e. atmosphere surface air). Often there is implicit knowledge about temporal and spatial references and the observation and measurement protocol. As an example, atmosphere surface air temperature is typically measured with thermometers placed in shelters positioned two meters above ground, according to the World Meteorological Organization (WMO) specifications. Typically unit selection differs among countries, regions and even among scientific disciplines and domains (Gkoutos et al., 2012). The different ways observables are quantified with respect to units of measurement add to semantic heterogeneity. This can lead to errors in data reuse and interpretation (Horsburgh et al., 2009) and renders data transformation to other formats a rather manual process, which eventually hinders data reuse beyond disciplinary silos.
The FAIR data principles have been introduced to formalize the requirements that scientific data must adhere to, in order to become findable, accessible, interoperable and reusable, by both humans and machines (Wilkinson et al., 2016). A key component of the FAIR data management principles is the adoption of high-quality metadata using community-adopted standards. Utilizing ontologies to support semantic interoperability is not a new concept in the environmental data domain (Bowers and Ludäscher, 2004;Madin et al., 2007;Madin et al., 2008;Rizzoli et al., 2008;Athanasiadis et al., 2011;Gruber and Olsen, 1994). An ontology represents the knowledge of a certain domain in a formalized manner through a set of statements (axioms) that define concepts and relationships between concepts (Villa et al., 2009). In the environmental domain, an ontology has been used to identify the physical processes, quantities, and their attributes (e.g. units of measurement) in a standardized manner (Yu and Liu, 2015). There have been several ontologies related to environmental sciences developed in the past decades, which received rather limited adoption (Athanasiadis, 2015). There is also a movement to facilitate data interoperability and reusability through the creation of new ontologies and dictionaries, which will be suitable for the Web (Rijgersberg et al., 2011;Compton et al., 2012). However, no clear winner exists among all these ontologies. One potential way to cope with ontology heterogeneity is through the use of intermediate steps such as vocabulary alignment or ontology mediation.
Ontology mediation refers to the process of describing different datasets through one ontology so that a common context is created and values can be reused (Regueiro et al., 2017). It is used to integrate diverse datasets, each of which is described by a different ontology, in order to become interoperable and reusable (Wilkinson et al., 2016;Shu et al., 2015). A semantic reasoner is a software agent which is essential component in the ontology mediation process.) By definition, it infers the implicit relations of an ontology (Mishra and Kumar, 2011), but can also support mediation among a number of them .
This work focuses on the interoperable and reusable principles for scientific data management. We present a declarative approach to cope with semantic heterogeneity in order to automate environmental timeseries processing and transformation. For each data file, we use a template to describe its syntax and a metadata file to annotate the corresponding observables through a vocabulary. Then, a semantic reasoner parses the metadata files and resolves relationships across the different data files. Data stored in a specific format can be automatically transformed to another syntax, with the reasoner inferring compatibility among the corresponding observables. 1 Also, we incorporated a unit of measurement transformation module. We demonstrate this with the weather input files of four crop modelling solutions, namely APSIM , AgMIP (Rosenzweig et al., 2013), DSSAT (Jones et al., 2003), and WOFOST (Diepen et al., 1989) and the meteorological timeseries data provided by the Koninklijk Nederlands Meteorologisch Instituut (KNMI).
The rest of the paper is structured as follows: Section 2 reviews contemporary approaches towards environmental data transformation and gives the background of template frameworks. Section 3 presents the objectives along with abstract architectural design of our approach and overviews its implementation. Section 4 demonstrates the application of the semantic approach and the used datasets. Finally, Section 5 discusses our initial key findings, identifies future work and concludes the research.

Background and related work
The ultimate objective of ontology-driven approaches is the integration of semantically heterogeneous datasets (Villa et al., 2009), i.e. the creation of a consolidated view of datasets that were originally curated differently, and annotated with different ontologies. This may enable having a single endpoint to submit queries to these heterogeneous datasets (Beran and Piasecki, 2009), providing seamless, frictionless access. In the environmental domain this process is described with many concepts: the terms mediation (Regueiro et al., 2017), translation (Shu et al., 2015) and integration (Leinfelder et al., 2010;Beran and Piasecki, 2009) are synonyms and have been used interchangeably. In the environmental data science literature we discern three approaches towards semantic interoperability, which are based on either: 1. approaches which support syntactic interoperability, e.g. environmental data management frameworks such as the ones offered by Open Geospatial Consortium (OGC)  and CUAHSI (Ames et al., 2012), or spreadsheets (de Vos et al., 2017), 2. Semantic Web stack technologies (e.g. RDF datastores, SPARQL, etc.) (Ziébelin et al., 2017), and 3. scripts 2 that create custom-to-dataset solutions.
Usually, the last two approaches cope with both syntactic and semantic heterogeneity at once.
Transformation of syntactically heterogeneous environmental timeseries into a consistent format is the concept around environmental data management frameworks. These frameworks, such the OGC SOS (Bröring et al., 2012) and the CUAHSI HIS (Ames et al., 2012), cope with syntactic heterogeneity by hiding the implicit syntaxes of diverse datasets and offering them through consistent data models (e.g. O&M Cox, 2011, WaterML Taylor, 2014. Efforts have also been made towards supporting semantic interoperability of such well-established frameworks. Henson et al. designed a semantic extension for the OGC SOS in order to submit high-level queries to raw data (Henson et al., 2009). Reguiero et al. in Regueiro et al. (2017) use controlled vocabularies to align semantics of various data sources for semantic mediation tailored to the OGC SOS protocol. Beran and Piasecki developed a knowledge base on top of syntactically interoperable, CUAHSI WaterML formatted datasets. In Beran and Piasecki (2009), they related terms from local vocabularies which were used to annotate environmental datasets, with terms from a universal ontology. This way, they addressed semantic heterogeneity and provided an endpoint, called Hydroseek, to submit queries to heterogeneous datasets curated by various environmental agencies. While Hydroseek further ensures interoperability by exporting all data in a standardised MS Excel format, uses have to undergo additional transformations in order to use the data in their scientific workflows. Additionally, Hydroseek mainly supports data consumers, as the process to add new data repositories is not detailed and seems to be carried out by the platform creators.
The standardised structure offered by spreadsheets made their utilization popular in the environmental data science domain (de Vos, 2017). This structure accounts for syntactic interoperability, and thus efforts have been made in order to complement those with semantic capabilities. Shu et al. (2015) present an ontology-mediation approach to deal with the translation of environmental data encoded in spreadsheets into XML. De Vos et al. in de Vos et al. (2017) present their ontology mediation approach which concerns the annotation of natural spreadsheets using external vocabularies, in order to identify the domain model implicitly defined in these natural spreadsheets.
Semantic Web stack technologies, as Linked Open Data, allow for addressing both syntactic and semantic heterogeneity. The approaches which fall into this category, usually transcribe datasets into semanticenabled datastores (triplestores) in order to support semantic data linking, processing and querying. For example Yu and Liu provide a single SPARQL endpoint to perform semantic queries to all underlying datasets (Yu and Liu, 2015). Bizer and Cyganiak present a tool, called D2R server, which publishes data stored in relational databases to a Semantic Web compatible format (Bizer and Cyganiak, 2006). Langegger et al. describe a mediator-based system for virtual data integration of scientific data (Langegger et al., 2008). Ziebelin et al. demonstrate a framework which uses the D2R server (Bizer and Cyganiak, 2006) to semantically link and integrate heterogeneous hydrological data sources. Interestingly, they support for enhanced interoperability as they disseminate the underlying, integrated datasets through OGC services (Ziébelin et al., 2017). Environmental timeseries integration and transformation via scripting have been previously investigated within the agricultural domain. Porter et al. developed small software programs, called translators to transform the weather data files of four agricultural models into the AgMIP-consistent data format (Porter et al., 2014). Similarly, Woodard in Ag-Analytics developed Python scripts to acquire diverse datasets, store them into a consistent data schema and then offered the transformed data as a service (Woodard, 2016). In both works, the proposed solutions address the syntactic and semantic heterogeneity by aligning all datasets to a consistent data syntax with a common data model.
The work presented here is built-upon a mechanism which accounts for syntactic interoperability, and thus falls into the scope of the first approach. Associating a dataset with an abstract representation of its syntax contributes towards syntactic interoperability. Papoutsoglou et al. introduced the notion of using a template to describe a dataset syntax, parse the corresponding datapoints and offer them as services on the web (Papoutsoglou et al., 2015). In Samourkasidis et al. (2018) we designed and demonstrated a template framework for data acquisition to cope with syntactic heterogeneity. Using this framework, e-scientists without a strong computer science background can acquire and reuse environmental timeseries from various outlets (e.g. webpages, local files, databases, etc.) and create custom views of data using templates. In this work we extend this template framework with a declarative approach to cope with semantic heterogeneity. The benefits of our approach are described in the next section.

Objectives
There are three objectives in designing and developing a system to support automatic transformation of heterogeneous datasets. The first is to lower the environmental data science barriers, as the target users are e-scientists. As mentioned in Section 2, curating environmental datasets is a manual and custom process. In order to cope with semantic heterogeneity and interpret data, users should possess the implicit domain knowledge incorporated in environmental datasets. In this work, we embraced a declarative approach to cope with semantics, which does not require from users more technical skills than those they already have.
The second objective is to support the discovery of compatible datasets. We consider one dataset to be compatible with one other, only if the observables reported in the first are equivalent with those reported in the other. A semantic reasoner determines compatibility, based on semantic annotations provided by users. This enables users to find compatible datasets of interest, originally stored in other formats.
The third objective is automatic timeseries transformation between compatible formats. The automatic timeseries transformation to different formats consists of two steps: (a) syntax transformation, and (b) content transformation. The former concerns the layout transformation, such as the column order or naming. We focused on the latter, that is the unit of measurement transformation of the observables reported in a source dataset to match the ones of a target dataset. Our approach, allows to cut across environmental data silos and facilitate timeseries reusability, as it enables users to (a) discover datasets in other formats, (b) transform them and (c) reuse them in their scientific workflows. This directly contributes to the toolshed for FAIR data management in environmental sciences.

Abstract architectural design
There are three key-components involved in the design of our system: (a) template files, (b) metadata files and (c) reasoner. Fig. 1 depicts the interaction among the components. According to our approach, each distinct data syntax is represented through a template and a metadata file. The reasoner parses the metadata files, stores the ontology definitions for the reported observables in a local ontology and infers compatibility among their corresponding templates. The inferred relationships cope with the semantic heterogeneity, as they support for timeseries transformation among compatible syntaxes.
A template file is an abstract representation of a data file contents Fig. 1. The different shapes represent heterogeneous data syntaxes. For each different syntax, the template copes with syntactic, and the metadata file with semantic heterogeneity. The reasoner infers the compatibility of the data syntaxes based on the ontology definitions declared in the metadata files. The red line arrow depicts an inferred by the reasoner relationship between the templates of two different data syntaxes. Dataset 2 can be automatically transformed to the syntax of the Dataset 1.
It is an extention as it comprises of the same plus some extra observables. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) using programming language agnostic semantics. Users draft one template for each data file syntax. They annotate important parts of the dataset using variables. Then, they define the observable metadata, represented through these variables in a metadata file. A metadata file is bound to a single template and consists of semantic annotations for the reported observables. Users describe each observable through a name (e.g. Temperature), an ontology class (e.g. ontology:ObservableClass), and if applicable with qualifiers (e.g. max, min, daily). They also provide information about the corresponding units of measurement. For each unit of measurement, the name and symbol are mandatory fields, while a definition through an ontology class is optional. For both observables and units of measurement, users can define equivalent classes from other ontologies.
The reasoner parses the metadata files in order to infer transformation compatibility among the templates. Firstly, it creates an instance for each ontology class found in each metadata file. If applicable, it generates on-the-fly concrete subclasses to combine the abstract observable along with its related qualifier(s). For example, maxDailyTemperature is a Temperature subclass which combines a statistical (i.e. max) and a temporal (i.e. daily) qualifier. Secondly, it defines a new class for each template which is described by a general rule, called axiom. This axiom asserts in ontology language that the given template comprises of certain observables.
Next comes the unit of measurement transformation. The parser calculates the conversion factor between each set of the compatible observables. This calculation is based on the units of measurement which are defined in the source and target metadata files, accordingly. Finally, the conversion factors are applied on-the-fly (if applicable) on each column, and then transformed, according to the target template, in order for the dataset instance to be presented to the user.

Use of ontologies
We used a local ontology, which can map concepts and classes defined in different ontologies. This ontology comprises of three high level classes, Observables, Qualifiers, and Templates. In the Observables class, we create subclasses for the observables of each dataset, as defined by users in the metadata files. In this version, we annotated observables and units of measurement with classes from a custom, local ontology 3 .
The template variables which are used to describe the dataset are stored as instances of the corresponding Observable subclass. We keep different namespaces for the instances of each template. This will enhance findability since each template will have its own prefix. So even for two templates using the same naming for their instances, there will be a distinction among them, based on the used prefixes (e.g. AgMIP:rain and WOFOST:rain). The namespaces can be optionally defined in the metadata file. In case they are missing, they can be generated based on the template file name.
The Qualifiers class is further refined into Statistical and Temporal mutually disjoint subclasses. Based on user input in metadata files, we define local statistical (e.g. max, min, mean, etc.) and temporal (e.g. daily, hourly, etc.) qualifiers and create their subclasses accordingly. A qualifier should always accompany an observable. In the end the system creates a composite subclass (e.g. maxTemperature) from the Observable (e.g. temperature) and its related Qualifier(s) (e.g. max) subclasses.
The Templates superclass holds the template definitions. We create a subclass for each distinct template along with its axiom definition. The axioms have direct reference to the Observables subclasses. The semantic reasoner uses these subclasses, when it comes to inferring compatibility among datasets.
Inferring compatibility among templates is facilitated by the local ontology and its properties. A hasObservable object property was defined to establish relationships among the Templates classes and their corresponding Observables. The axiom of a template with N associated observables defined with the ontologyA, is expressed in OWL language as follows: Templates and (hasObservable some ontologyA:observable1) and (hasObservable some ontologyA:observable2) … and (hasObservable some ontologyA:observableN) Based on the template axioms the reasoner infers four states of compatibility among two data syntaxes. If A is the source and B the target template representing different data syntaxes the possible states are: a. A is equal to B, means that both templates comprise of the same number of equivalent observables. b. A is an extention of B, means template A contains all equivalent observables reported in template B, plus one or more additional observables. c. A is a reduction of B, is the reversed (b). d. A is non-compatible to B, means that templates A and B may have or not observables in common.
A dataset represented with template A can automatically be transformed with template B in the first two cases.

Implementation
This approach extends the EDAM template framework Python module reported in Samourkasidis et al. (2018). It extends the template framework for data acquisition which already copes with syntactic heterogeneity, with a new module to support semantic operations. The system comprises of a parser and a semantic reasoner: EDAM supports the syntax transformation, Owlready2 Python library (Lamy, 2017) the ontology engineering and semantic reasoning, and Pint Python library (Grecco, 2019)

the unit of measurement transformation. EDAM implementation is available via Python-pip under an open-source license.
We reused open source projects to provide further functionality. Specifically, we developed a parser to extract user definitions about observables and units of measurement from the metadata files, and utilized Owlready2 to store them in a local ontology. Additionally, Owlready2 supports the semantic reasoning to infer compatibility among the semantically heterogeneous datasets. We utilized Pint to support the unit of measurement transformation. Pint calculates the multiplicand factor of two units (i.e. source and target), based on their symbols. By design, Pint supports all SI symbols and their derivatives.

Limitations
The system presented here is intended for environmental timeseries. The system can handle the same file types as EDAM (Samourkasidis et al., 2018), i.e. text-based timeseries stored locally or remotely in one or more files, websites and/or relational databases.
Towards inferring compatibility among datasets, the system takes into consideration only the observable section in metadata files. The temporal (e.g. hourly, daily, etc) and/or statistical (e.g. min, max, mean, etc.) dimensions of the reported observables should be defined as qualifiers. By definition, observables that are reported in different temporal resolutions or regard different statistical value are not compatible. For example, the following sets of source to target transformations are (mutually) incompatible (a, b, c) Table 1 The implicit semantics used by each data syntax to refer to the corresponding observables. A. Samourkasidis and I.N. Athanasiadis Computers and Electronics in Agriculture 169 (2020) 105171 Users can refer to terms from external ontologies, but these are not directly imported. EDAM creates a local ontology with these terms which serves as a dictionary among the used terms. In this version, external ontologies are not imported to be further used.

APSIM
The automatic transformation refers to the syntax and unit of measurement transformation. Any type of resampling in order to match source and target dataset temporal resolution is not included in the transformation process. Although EDAM offers this service, this is considered as a preprocessing step. Additionally, any possible spatial metadata are not considered when inferring compatibility.

Case studies
We demonstrate our semantic approach towards environmental timeseries transformation with the weather data files of four environmental models. Table 1 presents the selected datasets, the reported observables along with their implicit semantics and units of measurement. Besides the different semantics and units of measurement, each dataset has a different timeseries syntax.
For each dataset we developed a template to cope with the diverse syntaxes and a metadata file to annotate the reported observables. Fig. 2 depicts an excerpt of an input dataset for APSIM, the corresponding template (Fig. 2b), and the metadata file (Fig. 2c). The variable names inside the {{}} placeholders are used to draft the template, are reused in the metadata file to relate observables with their actual meanings. The observables are semantically annotated using a local ontology.

Compatible datasets
The reasoner operated on the five metadata files and updated the local ontology which can be further edited through dedicated ontology editors. It stored a class for each template, and automatically defined the template axiom based on the ontology classes of the related observables. Fig. 3 is a screenshot of the Protege ontology editor (Gennari et al., 2003) which depicts the asserted and inferred relationships among the datasets. The class-subclass hierarchy depicts the relationships among the datasets. In principle, the subclass dataset is an extention of the parent class and thus automatic transformation is possible. Based on the template axioms, the reasoner inferred the following relationships: • DSSAT extends APSIM, • AgMIP extends APSIM, • KNMI extends APSIM, • KNMI extends WOFOST, • AgMIP extends WOFOST Data compatibility was inferred based on the combined observable classes. These were generated on-the-fly: one subclass for every abstract observable. For example, for the AgMIP TMAX, the abstract observable is Temperature and the statistical qualifier is max. This combination results in the on-the-fly generation of maxTemperature, which is a Temperature subclass.

Automatic transformation
The system automatically transformed the compatible datasets upon user request. Transformation comprises of two parts: the syntax and semantic (or content) transformation. The former was performed by EDAM. The challenge here is with latter: the input and output templates use different semantics (i.e. observable identifiers). For example, AgMIP and APSIM datasets describe the max Temperature using the TMAX and maxt identifiers, respectively. The system established a relationship among the underlying observables of the input and output templates based on their compatibility. For example, it inferred that maxt and TMAX are synonyms and can be used interchangeably.
Unit transformation is performed on-the-fly upon dataset request. The system calculated the required conversion factors between source Fig. 3. A screenshot of the developed ontology in the Protege software. The Observables class consists of the observable types found in the different syntaxes. Combinations of these subclasses, describe each Template subclass. The reasoner inferred compatibility as depicted in the class-subclass hierarchy. For example, DSSAT dataset can be automatically transformed to APSIM, as the first is an extention (subclass) of the latter. and a target template units and applied them on the corresponding timeseries. Fig. 4 depicts a KNMI dataset (Fig. 4a) transformed according to the APSIM format (Fig. 4b). For this example, the conversion factors for the following unit of measurement transformation were calculated and applied on the source dataset: / m/s The system implementation is able to handle incompatible transformation requests, and annotations with unresolvable units of measurement. When a non-compatible transformation is attempted system issues an error. This error informs the user about the (in) compatibility of the involved datasets. The system can also handle units of measurement that either are not expressed correctly or are not SI units. In both cases, the system sets the conversion factor to 1 (i.e. no transformation) and raised warning messages to the user. For example, in this case a frequently found non-SI unit is the percent unit (%).

Discussion and conclusions
Environmental modelling solutions require their own input types and formats. As datasets are curated by different organizations, there are important differences in terms of syntax and semantics. Even related modelling solutions, such as APSIM and DSSAT (Jones et al., 2017), annotate the same observables through different local vocabularies and sometimes report their observables in different units of measurement. Semantic heterogeneity hinders environmental timeseries reusability, as transforming a dataset to another format is a laborious process (Beran and Piasecki, 2009;Horsburgh et al., 2009) which requires human expert intervention (Athanasiadis, 2015). In this work we presented a declarative approach to support environmental timeseries transformation. We employed a reasoner to infer transformation compatibility between semantically heterogeneous datasets, and developed a system to support units of measurement transformation. This contributes to the implementation of FAIR data management principles, and showcases the importance of metadata in automated discovery and transformation of data. We demonstrated how users may annotate datasets using a vocabulary, employ a reasoner, and transform data into other compatible formats.
Note that inconsistency when annotating observables with qualifiers leads to incompatibility. That is, qualifiers should be used either in all or in none of the datasets involved. While this statement is rather obvious, ensuring consistency when human users annotate manually the datasets is challenging. From the datasets we used only the KNMI included statistical and temporal (daily) qualifiers in their observable documentation. Had one independently annotated such datasets, the KNMI:DailyMaxTemperature would incompatible with APSIM:maxT, even though both refer to the very same observables. For this reason, we ignored the temporal and spatial metadata attributes for the KNMI data.
Automatic transformation of semantic heterogeneous data is essential towards environmental modelling in the IoT era.. This work also supports e-science, as the manual processing of data is often erroneous (Horsburgh et al., 2009). We also consider that this approach contributes towards lowering the e-science barriers (Swain et al., 2016). The proposed declarative approach copes with semantic heterogeneity, and enables e-scientists to transform compatible datasets to a given format, without developing scripts or being ontology engineers themselves.
This work has an exploratory character and sets the groundwork for future work. In this proof of concept, we supported semantic mediation, by enabling users to annotate the observables of the various datasets using a local ontology. The statistical and temporal qualifiers change fundamentally the meaning of the observables. However, most ontologies consider atomic measurements, thus statistical and temporal qualifiers are missing. Using qualifiers make observable annotation complete at a dataset level, and enables logical reasoning.
A possible direction for future work could be towards a module that infers temporal qualifiers and support automated annotation. Logically inconsistent annotations leads to incompatibility of datasets. Determining temporal qualifiers of an observable based on the contents of a data file is a step which would by-design offer consistent data representations, limiting human intervention.
Another possible direction for future work may be the design of intermediate, semantic model-templates. These model-templates would derive missing observables combining present ones. This is an essential step in cases where two syntaxes are not compatible because of a missing observable. An example from the case studies presented here is the incompatibility of the WOFOST and DSSAT datasets, because of the Vapor Pressure observable reported only in the former. An intermediate model-template to derive it, combining the Temperature and Dew point (present in the DSSAT) would allow automatic transformation between them. Fig. 4. The reasoner inferred compatibility between KNMI and APSIM and dataset depicted in (a) was automatically transformed according to APSIM format (b). The units of measurement transformation was performed on-the-fly.

Conclusions
In this work we presented our approach to cope with semantic heterogeneity towards transforming environmental timeseries data. We extended a data acquisition template framework which accounts for syntactic interoperability with a reasoner and a unit transformation module. This declarative approach enables users to annotate a data syntax once using terms from a vocabulary and then transform it to other compatible syntaxes. The employed reasoner infers the compatibility among different syntaxes, by creating a semantic description of each one. Then, the unit transformation module determines the relationship among the units and performs on-the-fly transformation where applicable. We demonstrated our declarative approach with the weather input files from four agricultural models and the meterological timeseries data from the Dutch Meteorological Office. In all cases where the reasoner inferred compatibility between two distinct datasets, we were able to transform the syntax and the content of one to another.

Declarations of Competing Interest
None.