Relationship Between Spatial Datasets and Assessments of Mapped Ecosystem Services Indicators

Already now we have a huge amount of machine readable data publicly available in open repositories. The task of research is to identify and analyze resources from Latvian official open repositories and to find out if any relationship with the values of Indicators from pilot Ecosystem Services valuation project in Jaunkemeri territory exist and to highlight the way how to determine correlations between machine readable information from open machine readable spatial datasets and Ecosystem Services Provisioning, Regulating and Cultural mapped indicators’ assessments. The aim of the research is to create prerequisites for decision making in sustainable land development.


Introduction
Already now we have a huge amount of machine readable data available in open repositories.Latvia does not fall behind with ICT technological progress.From January 2006 there is VISS infrastructure available based on SOA principles which is accessible to Latvia ICT developers (Semenchuk, 2011).VISS infrastructure includes TG for XML schema (WEB (a), 2016) and (WEB (b), 2015) WS developers; and XML schemas catalogue (WEB (c)).Also, there are available two SDI MetaData Catalogues (WEB (d) and WEB (e)), where part of metadata is provided to fulfill INSPIRE directive and implementing rules requirements, and another part is provided for usage on local level.Moreover, there is available OpenData portal with data catalogue capabilities (WEB (f)).
From the other side we have data from Latvia ES mapping pilot campaign.Mapping has been finished in 2016 in two areas and results are publicly available (WEB (g)), however, not in a machine readable way.
In the following chapters the resources from Latvia open repositories and ES valuated indicators from Jaunkemeri (Fig. 1) -ES mapping pilot project area (91 ha) are going to be described.In addition, datasets and ES indicators will be compared between themselves with the aim to detect a relationship.

Metadata and data open repositories
In this chapter the information from main Latvia (not only) open metadata or data repositories is going to be described.It is of high importance to understand that each metadata catalogue or data repository is managed by its' own information systems.The architecture of information systems and data models from data holder to data holder can differ significantly.The process of understanding the software system is timeconsuming, because it is difficult to understand the source code quickly without any model of the software system (Ovchinnikova and Asnina, 2014).Usage of common software and data models infrastructure for different data holders is highly desirable, because it can facilitate the understanding of software behavior and data models for every user/developer.In addition, Trinkunas and Vasilecas after a thorough analysis of available knowledge sources decided that the most suitable by many of quality properties are universal and commercial data models (Trinkunas and Vasilecas, 2009).URL (WEB (d)).On March 2018 there were 80 records registered and it was claimed that all metadata records conform to ISO19115 standard.Being guided by (Burkhard et al., 2009) and (Holms et al., 2016) for detailed analysis (see Table 1) there are resources selected which contain information about land use or land cover.This is because the most ES assessment models are based on basic data from land use and land cover datasets, for example, CLC.Using quantitative and qualitative assessment data in combination with land cover and land use information originated from remote sensing and GIS, thus impacts of human activities can be evaluated (Burkhard et al., 2009).  1 we can see, that two categories of online resources are registered: Raster WS: WMS and WMTS; and Feature WS: INSPIRE Feature Download WS and WFS.Taking into account that both categories of resources exist and that raster processing is very hard computing operation, we will focus on Feature resources.Feature resource 'WFS Corine Land Cover Changes 2006 -2012 LV' with only changes between CLC2006 and CLC2012 will also not be considered.Online resource 'WFS INSPIRE Land Cover LV' does not support DescribeFeatureType operation and cannot be downloaded at least at this moment.However, online resource 'WFS Revised Corine Land Cover 2006 LV' information that intersects our area of interest, was successfully retrieved.It can be seen that only two Corine polygons overlap our area of interest (Fig. 2).According to (WEB (h), 2010) nomenclature '142' -means 'Sport and leisure facilities', and it its turn '312' means 'Coniferous forest'.However, accuracy of CLC maps is not high enough.The CLC is a vector map with a scale of 1:100 000, a MCU of 25 ha and a geometric accuracy better than 100m.It maps homogeneous landscape patterns, i.e. more than 75% of the pattern has the characteristics of a given class from the nomenclature (WEB (h), 2010).

Latvian metadata catalogue. GDC (Geospatial data connector)
Latvia metadata catalogue provider is Latvia State Regional Development Agency. .The Latvian metadata catalogue content and summary for referenced spatial data sets and services for year 2017 were described by (Holms and Vitols, 2017b).In march 2018 there were 251 records registered and it has been declared that metadata standard name for 106 records is 'INSPIRE Metadata for Datasets', for 110 records is 'INSPIRE Metadata for Services' and for 35 records is 'LATVIAN Metadata for Data'.In 2017 an overview of data from datasets registered in Latvian spatial metadata catalogue was published by (Holms and Vitols, 2017a).Complementing CLC2006 information from LGIA metadata catalogue, records and data about Latvia soil (see Table 2) are selected for detailed analysis and compared with ES indicators.According to the Table 2 we can see that there are registered two types of online resources: Predefined datasetin GML and Raster WS -WMS.We do not focus on the resource 'Vēsturiskā augsnes digitāla datubāze -augsnes dziļrakumi (INSPIRE lejupielādes pakalpojums)' because this is a set of points but we need polygon objects, which lies in our area of interest.The resource 'Vēsturiskā augsnes digitāla datubāzeaugsnes laukumi (INSPIRE lejupielādes pakalpojums)' fulfil requirements, but does not cover our area of interest (see Fig. 3).The resource's geometry type is a kind of polygon.The resource is available as zipped GML, thus resource is not available in direct machine readable way, but after unzipping it is possible to get a machine readable dataset in GML (the size of dataset is 1.5 GB).The same situation is with the next two biggest datasets: 'Lauku bloki', where information about agriculture land is accumulated and 'Meža zeme' where should be all information about land useforest provided.Both datasets do not cover/lie in area of interest (see Fig. 4 and Fig. 5).

Latvia's Open Data portal
According to (Bojārs and Liepiņš, 2014), Latvia open data community and open data catalogue has existed almost since 2014.But since 2017 there is available Open Data portal (WEB (f)) which complies with DCAT -W3C recommendation (WEB (i), 2014).Now Open data contains 74 data set from 20 data providers.There also exist some environmental datasets in CSV and SHP formats.CSV datasets are available through the API.All datasets are being disseminated under CC0 license.The content of the catalogue can be serialized as RDF, n3, xml, jsonld and ttl representationsee Table 3.
In addition, this portal allows to access the data and metadata on API level, including data search using SQL directly through URL and data manipulating.For example, it is possible to get all records from resource with Riga address points where street name is 'Pavasara iela', using the following request: https://data.gov.lv/api/action/datastore_search_sql?sql=SELECT * from "54ced227-e043-486c-a4c9-d6b2dc241c4b" WHERE "iela" = 'Pavasara gatve'.It is important to note that the OECD has APIs that provide access to datasets in the catalogue of OECD databases and allow to query the data in several ways using parameters to specify your request (WEB (j)) and the catalogue of OECD indicators does exist.

Planned Land Use information system (TAPIS)
TAPIS consists of two main modules: 1) for spatial planning authorities and 2) for Public users.The module for planning authorities provides: a) predefined development process for spatial development planning documents; b) development, publication and maintenance of all planning levels documents; c) data exchange with state information systems; d) public discussion organization and e) some public electronic services.In its turn public user module provides the following capabilities for residents and merchants: a) search for textual and spatial information on Planned Land Use; b) participate in the public discussion on Planned Land Use; c) receive notices on Area of Interest and d) receive statements from municipalities as e-services (WEB (k)).In addition, TAPIS provides Planned Land Use WMS and WFS, see Table 4.
WMS is raster service, but WFS is machine readable Feature service.Information about Planned Land Use is available (see Fig. 6) in machine readable way for 23% (17794 km2) of territory of Latvia (situation on March 2017).From the resource URL it looks, that dataset was created to implement INSPIRE directive, but according to information from INSPIRE Geoportal Validator: Average degree of conformity of INSPIRE metadata is 38.89% and average degree of interoperability of INSPIRE Resources is 0.00%.Anyway, information is available in machine readable way and overlaps area of interest (see Fig. 7).And it is very convenient that we can access and get information through WFS only which lies in our area of interest.This is possible, by sending bounding box parameters in request.For example, see Table 5.

Ecosystem Services Indicators
In Jaunkemeri pilot area ES mapping and evaluation have been done from November 2015 till May 2016.It has been expected that ES mapping and evaluation materials will help in the process of spatial planning process related to the Jaunkemeri pilot area as well as to make prognosis about different scenarios in area (WEB (g)).In order to evaluate ES there is Burkhard's concept for land-cover based assessments also being used (Burkhard et al., 2009).There are 19 indicators used in general -4 for Provisioning ES, 10 for Regulating ES and 5 for Cultural ES.For each ES Indicator, passport of indicator was created by experts, there were method/s of calculation of indicator described.For all indicators 6 point scoring system is being used.When assessment of all indicators was completed the ES indicators values were used to create 19 maps.One map for each indicator (see Fig. 8).For comparison with information from Latvia open repositories there are three ES Indicators selected.One indicator from each ES category.From Provisioning ES -A1 'Forest berry yield'; from Regulating ES -B12 'Carbon capture potential index' and from Cultural ES -C2 'Leisure (active and passive) potential'.

Results
The data from PLU IS/TAPIS was selected as the most suitable for comparison with ES mapped indicators.In the beginning the scenario for data processing was created.Scenario provides comparison of every planned land use with every mapped assessment in each indicator (see Table 6).To detect correlations (feature size and configuration) between two spatial datasets, 10m x 10m grid was laid over the datasets.The mesh size can be adjusted according to the spatial dataset's level of details and computational performance.For comparable spatial datasets there was Id assigned to each cell of grid (see Fig. 9).For intersecting features from spatial dataset with the grid the Fill percentage was calculated for each cell.For example, from Fig. 9 you can see the Fill percentage for cell with Id 5 for Indicator's 'Active and passive recreation opportunities' Assessment Nr.2 and for feature '4_TransportNetworksLogisticsAndUtilities'/Road from Planned Land Use 78% and 66%, respectively.
After calculating the Fill percentage for all cells from both comparable spatial datasets the acquired information can be represented as a table.For example, in Table 7 you can see the Fill percentage for all cells from Fig. 9.  Correlation between Feature1 and Feature2 is 0.78.The conclusion can be drawn that there is a very strong positive relationship between two datasets.Such strong correlation in our example is due to the fact that expert was using cartographic information about roads to shape the feature for Assessment nr.2 of indicator 'Active and passive recreation opportunities'.But method described above can be applied for comparing any spatial datasets and can be useful in hidden correlation detection between at first glance unrelated spatial datasets.
As described above all pairs of PLU and mapped assessment in each indicator were compared.As a result, PLU (see Fig. 7) dataset was compared with each assessment from three ES indicators.This was done creating 10m x 10m grid (database table) which overlay all piloted area territory.The grid contains 9468 cells/records.Every record contains information about how many square meters are allocated under each PLU or ES indicator's assessment (10 columns in each records).For 91 ha territoryapprox.85000 parameters were analyzed.For all ES indicators' assessment and PLU pairs correlations were found.For most pairs p-values with 0.95 confidence level are significant (less than 0.05).For this analysis the following software were used: QuantumGIS for geoprocessing, pgAdmin4 for processing with SQL on database level and R for statistical analysis.After the analysis of the above described correlation coefficients, the following conclusions can be made: a) there is a very strong positive relationship (c2_2 x plu4) between indicator's 'Leisure (active and passive) potential' second assessment's territory and territory 'Transport networks, logistics and utilities' from PLU dataset; b) there is moderate relationship (a1_2 x plu4 and c2_4 x plu6) between indicator's 'Forest berry yield' second assessment's territory and territory 'Transport networks, logistics and utilities' from PLU dataset and between indicator's 'Leisure (active and passive) potential' fourth assessment's territory and territory 'Other uses' from PLU dataset; c) there is a weak relationship between a1_2 x plu6, b12_1 x plu4 and plu6, c2_2 x plu6, c2_3 x plu3 and c2_4 x plu4.See Table 8 for all significant relationships.The method described above allows detecting correlations between any spatial dataset and Ecosystem Provisioning, Regulating and Cultural services mapped indicators' assessments.This is especially important for Regulating and Cultural ES because linkage between Regulating and Cultural ES and spatial datasets (cartographic information) is not always obvious.This method can be applied in the Alternative development plans development in Sustainable land development (see bullet 'Alternative development plans' on the Fig. 13) or in online Decision-making (like hints) with the aim of the best development scenario selection for the specific area/territory.

Moderate positive relationship
Recognition of these relationships would be important in decision making on land use policy design including alternative policies assessment and rule based land development.
There is a great interest in rule-based information systems and their development now (Kalibatiene and Vasilecas, 2010).In the step of information system (for example, Planned Land Use information system) conceptual modelling, researchers are challenged to transform application domain ontology to a conceptual data model, since their conceptualization of a real world is similar (Kalibatiene and Vasilecas, 2010).Kalibatiene and Vasilecas also point that to define application domain rules, the consensus from all the domain stakeholders should be obtained on the problem of which the rules and their meaning should be used and after the rules is defined it is important to determine which rules should be implemented in information system (Kalibatiene and Vasilecas, 2010).Analysis consistency rules in different IS models show that most rules are expressed in natural and formal language; rules expressed in natural language may be interpreted ambiguously (Kalibatiene, Vasilecas and Dubauskaite, 2013).Here it is important to understand that not all the rules can be implemented on IS level, some of Fig. 13.Information system's architecture for land development (Holms et al., 2017).the rules can be implemented only in legislative form.Cooperation with legislative bodies and support to introduction of innovative ideas into production are very significant (Cevere and Gailums, 2017).When doing this we should remember that it can be necessary to use this information as data source.In order to the information monitoring and assessment problems, automated analysis of information from open access sources can be performed using ICT tools (Fomin et al., 2017).For this purpose, Fomin et al in his article offers to use 'Open language infrastructure' concept and to develop ICT tools for media monitoring including possibility for information clustering and classification.This clustering and classification can help in on media content based new event identification which in its turn can assist in decision making in emergency situations.Moreover, clustered and systematized information from media can be used as data source for assessment of Cultural ES.
Bumans acknowledges that there are several practices allowing mapping relational databases to RDF schema (Bumans, 2010).Moreover, Mazzieri et al. in 2005 proposed that RDF syntax must be extended to add to the triple a value (Mazzieri and Dragoni, 2005).In our research we verified that adding to RDF triples the valueit can be very convenient for describing relationships between spatial datasets (including between spatial datasets and ES indicators' mapped assessments for example), see Table 9.Moreover, there was a possibility highlighted of describing the relationship between data in datasets as RDF triples with fuzzy predicate.This approach is based on Fuzzy Logic concept.In its turn Fuzzy Logic is based on the knowledge that the reality is rather inexact than precise because all made human affirmations have a certain free interpretation domain.As a special case the traditional binary logic is part of fuzzy logic, but operating only with two values of interpretation, 0 or 1, yes or no.In contrast to the well-defined sets of the Set Theory, real existing sets are rather fuzzy limited, essentially due to the uncertainties in the used language.A set is fuzzy limited if the assignment of one is not given to all the members of the set.A fuzzy set is defined by the so-called membership function, that can take any values on the interval [0, 1], not only 0 or 1.The key notion when modelling with Fuzzy Logic is the linguistic variable (Tulbure, 2013).

Conclusions
 It is possible to determine if statistically significant correlation between spatial datasets (cartographic information) and Ecosystem Provisioning, Regulating and Cultural services exist;  It would be convenient for describing this relationship in machine readable way to use fuzzy semantics -RDF triples, appended with a value. The offered method allows detecting correlations between any spatial dataset and Ecosystem Provisioning, Regulating and Cultural services mapped indicators' assessments which is important in decision making on land use policy design, including alternative policies assessment and rule based land development. The method described in section 4. can be applied in the Alternative development plans development in Sustainable Land development (see Fig. 13) or in online Decision making (like hints) with the aim of the best development scenario selection for the specific area/territory.
Catalogue declares that its conforms to open GIS CSW standard version 2.0.2 and 'COMMISSION REGULATION (EU) No 1089/2010 of 23 November 2010 implementing Directive 2007/2/EC of the European Parliament and of the Council as regards interoperability of spatial data sets and services'.The catalog is accessible in machine readable way as CSW or INSPIRE Discovery service by URL (WEB (e))

Fig. 6 .Fig. 7 .
Fig. 6.Areas where Planned Land Use data is available in machine readable way

Fig. 10 .Fig. 11 .Fig. 12 .
Fig. 10.Correlogram for A1 and PLU (confidence level = 0.95) Latvian Geospatial Information Agency metadata catalogue's provider is Latvian Geospatial Information Agency (LGIA).This catalogue can be used as an access point of web services provided by the LGIA.LGIA catalogue declares that it conforms to open GIS CSW standard version 2.0.2 and 'COMMISSION REGULATION (EU) No 1089/2010 of 23 November 2010 implementing Directive 2007/2/EC of the European Parliament and of the Council as regards interoperability of spatial data sets and services'.The catalog is accessible in machine readable way as CSW or INSPIRE Discovery service by

Table 1 .
Land Cover resources from LGIA metadata catalogue

Table 2 .
Records about Latvia soil from Latvian metadata catalogue

Table 3 .
Data.gov.lvmachinereadableaccesspointsIn 2009 a report was published about measuring the relationship between ICT and the environment where it was stated that the relationship between ICT and the environment field of statistics was not observed, although separately ICT statistics and environmental statistics are recognized fields (Organisation for Economic Co-Operation and DevelopmentOECD, 2009).Some issues about data harmonization are highlighted, for example, compiling global reports; it is not possible to compile global data due to the fact that for some countries different reporting requirements exist.There are some indicators defined for ICT and the environment but at first glance these indicators are not easily harmonizable with ES classification.It was emphasized that it is important to make indicators or classifications to sustain cross border and cross industry interoperability.

Table 4 .
Access points for Planned Land Use Map and Feature services

Table 6 .
Data processing plan and abbreviations used

Table 7 .
Fill percentage for Feature1 'Active and passive recreation opportunities' Assessment nr.2 and Feature2 'Road from PLU' -according to Fig.9.

Table 8 .
Relationships between ES Indicators and PLU

Table 9 .
Example of valued RDF triple