Publishing China satellite data on the GEOSS Platform

ABSTRACT This paper is the first of a series that describes some of the main dataset resources presently shared through the GEOSS Platform. The GEOSS Platform has been created to provide the technological tool to implement the Global Earth Observation System of Systems (GEOSS); it is a brokering infrastructure that presently brokers more than 190 autonomous data catalogs and information systems. The paper analyses the China Satellite datasets and describes the data publishing process from China GEOSS Data Provider to the GEOSS Platform considering both administrative registration as well as the technical registration. The China Satellite datasets are considered as one of the most important satellite data shared by the GEOSS Platform. The analysis provides some insights as well about GEOSS user searches for China Satellite datasets.


Introduction
The Group on Earth Observation (GEO) is an intergovernmental partnership working to improve the availability, access, and use of open Earth observations, including remote sensing (e.g. satellite and airborne observations) and in situ data, to impact policy and decision-making in a wide range of sectors (GEO, 2022). Established in 2005, today, GEO is a partnership of more than 100 national governments and more than 100 Participating Organizations that envisions a future where decisions and actions for the benefit of humankind are informed by coordinated, comprehensive and sustained Earth observations. In 2015, the GEO Plenary (convened in Mexico City) reaffirmed that GEO is committed to the coordination of Earth observation systems and building the Global Earth Observation System of Systems (GEOSS) for the benefit of humankind (GEO, 2015).
GEOSS is a social and software ecosystem sharing independent and open Earth observation information and processing services. The GEOSS Platform formerly called the GEOSS Common Infrastructure (GCI) has been created to provide the technological tool to contribute implementing GEOSS (Boldrini, Hradec, Craglia, & Nativi, 2021;Craglia, Hradec, Nativi, & Santoro, 2017;Nativi et al., 2015). The GEOSS Platform is the cornerstone around which the GEOSS software ecosystem is implemented. The GEOSS Platform is the "glueware" that enables the connection and coordination of the many autonomous and multiorganizational systems and services contributing to GEOSS (GEOSS Infrastructure Development Task Team, 2017). In the past, the GEOSS platform has evolved a lot to address new challenges and leverage the technological developments. A new advanced GEOSS platform is under development to embrace the opportunities offered by the digital transformation of society (Guo et al., 2020;Nativi, Mazzetti, & Craglia, 2021;Santoro, Mazzetti, & Nativi, 2020). Notably, the new GEOSS platform will enable model sharing such as demonstrated in the last GEO Plenary (Ollier, 2019).

Resources publishing in the GEOSS Platform ecosystem
Through the GEOSS Platform users and software clients can discover and access the required Earth Observations (EO), leveraging the enhanced search and discovery capability of the GEOSS Portal and a set of APIs. Resources keep the format distributed by the Provider which can be specifically a GIS usable format, or knowledge type such as documents, research analysis. Beside discovering and accessing the required resources, users are also exposed to a large variety of EO resources from a multitude of Providers, this can broaden up the knowledge of the users looking for specific sets of resources, by discovering additional sources not considered before.
The procedure to join the GEOSS Platform ecosystem and become a data/service Provider is rather simple and consist of two main steps: an administrative and a technological registration -see Figure 1. To implement technological interoperability, keep the data/service Providers autonomy, and leave the ecosystem-as-a-whole free to evolve, the GEOSS Platform adopts a service brokering approach (Craglia, Nativi, Santoro, Vaccari, & Fugazza, 2011;Nativi, Craglia, & Pearlman, 2013;Santoro, Nativi, & Mazzetti, 2016).
In a nutshell, the GEOSS Platform is an infrastructure for brokering data services. To this aim, this infrastructure implements the necessary administration, policy, and technological interoperability arrangements for metadata/data mediation and harmonization. The GEOSS Platform services allow Data Providers to share their datasets, leaving them autonomous, as to applied data policies and implemented standard technologies. The GEO Secretariat (GEOSec) is the entry focal point for all the engagements with a new (as well as already registered) Data Provider. GEOSec guides Data Providers with the two steps that are necessary to register and join the GEOSS Platform ecosystem. This interaction aims to ensure a successful dialogue and an effective interoperability. A new Data Provider must go through (GEOSS Infrastructure Development Task Team, 2017): • Administrative Registration: to accomplish this task, the Data Provider must interact with the Yellow Page (YP) component of the GEOSS Platform. The GEOSS YP services catalogue the Data Provider organization, as a new formal contributor to GEOSS, requesting the necessary information on the administrative, policy, data content, and technological aspects -e.g. to specify the published resource types, the administrative contact point, the online data service endpoints, the applied policy for data access, etc. The (recently improved) registration procedure aims to provide Data Providers with a clear, simple, and transparent online form to be filled (see supplemental Annex A). The form was designed to address people who are not experts in IT technology. The form entries are validated by the YP. To compile the form, Data Provider can contact the GEO Secretariat for consultation and to sort out any doubt. For GEOSS Data Providers, the inclusion in the Yellow Pages brings some benefits, including: (a) To get visibility of their contribution to the GEOSS Platform (for example, by adding their logos, description, etc.) via a dedicated functionality of the GEOSS Portal 1 . (b) To associate their datasets to specific UN SDGs -this also improves their discoverability and usability, via the GEOSS Portal. (c) To auto-assess their adherence to the GEO Data sharing and Management Principles -improving their data re-usability. • Technical Registration: to finalize this task, a Data Provider must interact with the GEO DAB component of the GEOSS Platform. The GEO DAB services set up and test the interoperability level of the data services published by the Data Provider. This is accomplished by instantiating and running the necessary brokering arrangements to connect the Data Provider infrastructure to the GEOSS Platform ecosystem. The brokering arrangements depend on the data discovery and access standards (i.e. service protocols/APIs) that are published by the infrastructure of the Data Provider. The process consists of the following phases: (a) Set-up phase -The definition and setting up of the interoperability test (i.e. the brokering process); the GEO DAB team tests the interoperability level of the data discovery and access services/APIs published by the Data Provider, making use of the standards already in use by the Data Provider. (b) Testing phase -The interoperability test(s) are executed, and a report is produced to highlight possible elements to be fixed and/or fine-tuned by both parties -this action is cyclic until the pursued level of interoperability is achieved. (c) Publication phase -upon clearance of the Data Provider, the new discovery and (possibly) access services are added to the GEOSS Platform metadata catalogue, becoming accessible by the GEO users. (d) Production phase -finally, GEO and the new Data Provider decide to move the new ecosystem provision on the operational (or production) GEOSS instance and maintain that according to the GEOSS service-level-agreement. A public press communication may be jointly released.
This manuscript is the first of a series that describes some of the main dataset resources presently shared, through the GEOSS Platform, according to the GEOSS data sharing principles (GEO, 2015). The authors discuss the publishing and accessibility challenges addressed by the GEO Data Providers to share their datasets utilizing the services and APIs offered by a complex System of Systems Platform like GEOSS. As presented in the manuscript, issues may deal with policy, social, and technological aspects.
The next section will introduce the China Satellite Datasets. Then, section three discusses the implementation of the two registration steps -previously introduced. Finally, a conclusion section argues the significant contribution made by the China GEOSS Data Provider through an analysis of relevant statistics information and what are the possible enhancements to improve data discoverability and accessibility for China GEOSS datasets.

China satellite datasets
Over the last decades China has launched a series of Earth observation satellites, including the FengYun (FY), HaiYang (HY), China and Brazil Earth Resource Satellite (CBERS), HuanJing (HJ), and GaoFen (GF) series, etc., to dynamically and continuously expand its capacity for capturing Earth resource information. Satellite data are widely used in meteorological monitoring, disaster reduction, agriculture, forestry, and environmental protection. ChinaGEOSS Data Sharing Network (ChinaGEOSS DSNet) is a national-level Earth observation platform dedicated to strengthening the integration of China's existing Earth observation systems to serve both China and the world community (Zhang et al., 2019). It was created in 2011 by the China GEO Inter-Ministerial Coordination Group and led by the National Remote Sensing Center of China (NRSCC) of the Ministry of Science and Technology of China. Since 2016, ChinaGEOSS DSNet annually selected high-quality Earth observation datasets from domestic satellite data and scientific products and published the metadata records to the GEOSS Portal through the GEO Discovery and Access Broker (GEO DAB). As of January 2022, more than 4.5 million Chinese Earth observation resources can be discovered, queried and accessed in the GEOSS portal (Tables 1 and 2).

Administrative (Yellow Page) registration
The administrative registration has been completed in March 2017 by the China GEOSS Data Provider. Table 3 shows the required information and the contents filled by the China GEOSS Data Provider.   Protected personal data (Contact points names and emails) are not available in agreement with the General Data Protection Regulation (GDPR). The service endpoint is omitted due to policy reasons and to its different status since its start.

Interoperability (GEO DAB) registration
The interoperability registration is a brokering process done through the GEO DAB component which is a middleware software in charge of interconnecting the heterogeneous and distributed capacities contributing to GEOSS. The GEO DAB realizes an abstract and harmonized view of the diverse data/metadata by mapping the different (Data Provider specific) metadata models into a ISO19115-based internal model, which is rich and extensible enough to describe geographical datasets in detail.
ISO 19115 is indeed a standard defining more than 400 metadata elements and including an extension mechanism to create additional metadata sections, elements, and code lists. Data Provider services are periodically harvested by the GEO DAB to collect original metadata records, harmonize them to the internal model, and consolidate the information to a central database. This process enables harmonized discovery of datasets across the disparate and heterogeneous GEOSS Data Providers.
In particular, the brokering process is used to establish a connection with the remote China Satellite system in terms of discoverability and accessibility. The GEO DAB publishes 16 dataset Collections from the different satellite missions and scientific products including 4,626,807 datasets representing the products/granules. All the metadata records present in the China GEOSS data system use the JavaScript Object Notation (JSON) based format and are harvested (stored in a central database) through the HTTP protocol. The main products metadata fields coming from China Satellite data used for the mapping into the GEO DAB during the harvest operation are: • satelliteID: identifies the satellite mission and it is used to determine to which Collection the products belong; • sensorID: identifies the product type; • bottomLeftLatitude, bottomLeftLongitude, bottomRightLatitude, bottomRightLongitude: indicates the lower spatial coordinates; • topLeftLatitude, topLeftLongitude, topRightLatitude, topRightLongitude: indicates the upper spatial coordinates; • receiveTime: represents the time range; • dataOwner: identifies the organization's name; • fileSpecification/dataID: identifies data identifiers used to build the title of the record; • dataURL/fileStorePath: contains online information to access or download the product; • productFormat: indicates the product data format; • fileSize: indicates the product size expressed in Megabytes (MB).
In terms of discoverability, the GEOSS Platform allows users to retrieve datasets through queries based on geographical coverage, temporal extent, and specific queriables like satellite product type corresponding to sensorID field values. Figure 2 shows the China GEOSS datasets Collections available in the GEOSS Portal.
In terms of accessibility, the GEOSS Platform allows users to download the Chinese data products through the FTP protocol. Initially the China GEOSS service provided metadata only and the data were not accessible; subsequently, the Data Provider progressively updated its service providing the necessary information to download data. At the first stage GEO DAB implemented a specific solution to retrieve the absolute URL path for downloading data using the following elements: username:password @ FTP site + dataURL + fileStorePath + fileSpecification Currently the China GEOSS service exposes the absolute URL path for downloading data in dataURL field.
In some cases, due to policy agreements, direct data download from the GEOSS Platform may not be possible. In this case, users are redirected to the Provider's data system. In general, the China Satellite data system requires the use of credentials (username and password) to download products. Most products can be downloaded as Hierarchical Data Format (HDF) file, commonly used for satellite-based datasets.
The GEO DAB exposes a set of high-level client-side Open APIs that can be used to discover and access GEOSS resources via the GEOSS Platform. GEO DAB publishes three types of APIs: • Standard geospatial interfaces: de-iure and de-facto standard interfaces for geospatial data discovery, access and visualization, such as OGC CSW, WFS, WCS, WMS, OAI-PMH etc. • Server-side API: a set of server-side APIs, including OpenSearch and RESTful APIs.
RESTful APIs allow an easy interaction with the GEO DAB through any programming language which supports the HTTP protocol and the JSON message encoding. • Client-side (Web) API: This is a HTML5+Javascript+CSS library which facilitates the development of web and mobile applications, hiding the interaction with the GEO DAB behind the behavior of simple Javascript objects.

Discussion and considerations
Analyzing the China GEOSS datasets shared via the GEOSS Platform can be useful to understand how the datasets are used and if datasets discoverability and accessibility could be improved to enhance user's experience.
The analysis provides statistics information about the use of China GEOSS datasets taking into considerations human interactions (users of the GEOSS Portal). All the analyzed indicators hold information over the last 9 months (from 1 August 2021 to 30 April 2022). In these range period about 600 queries made by GEOSS Portal users matched one or more China GEOSS records with an average of about 66 matching queries per month. Moreover, China GEOSS records were also distributed to organizations that periodically harvest the entire GEOSS content such as the WMO WIS GISC.
An initial investigation concerned the characterization of user queries that matched China GEOSS records. We focused on queries having a keyword constraint to identify the most utilized keywords to match China GEOSS datasets. Figure 3 reports such queries on a pie chart, showing the most occurring keywords and their percentages.
The most popular keywords are China and China GEOSS that together are present in more than 70% of the queries. Between the satellite data the most searched keywords are GaoFen, Global Ecosystem and Environment Observation Analysis Research Cooperation (GEOARC), China-Brazil Earth Resources Satellite (CBERS2) and TanSat. Figure 4 illustrates the popularity of all the 16 retrieved dataset Collections from China GEOSS datasets. The China-Brazil Earth Resources Satellite (CBERS 1,2,2B) are the most searched Collections with more than 26% of returned results while the Global Ecosystem and Environment Observation Analysis Research Cooperation (GEOARC), the FengYun-2 H and the FengYun-3D Collections have lower numbers (about 3% of returned results) because they were added only starting from November 2021 in the GEOSS Platform.
We tried to analyze the organizations that searched most for China GEOSS records (identified by using the "whois" program on the IP addresses associated to the queries); however, the identification mechanism is not always possible and sometimes does not give meaningful results. Indeed, the majority of IP addresses are associated to Internet Service Providers (ISPs), which provide dynamic IP addresses to their users -a couple of factors may contribute to that: employees smart working (due also to the COVID lockdowns) and the interest of citizens. For the cases where the identification was possible, the top users come from research and academia organizations. Table 4 shows the top countries from where the user requests were performed to retrieve China GEOSS datasets. The first five top countries are China, Italy, United States, Bulgaria and Germany.    China GEOSS is one of the leading Data Providers (behind only Copernicus Open Access Hub, CUAHSI-HIS and USGS Landsat 8), contributing with about 6% of the total datasets in the GEOSS Platform and about 4% over the European hotspot (Boldrini, Hradec, Craglia, & Nativi, 2021). However, despite being one of the most important Data Provider in the GEOSS Platform, the overall metadata quality of the published datasets could be significantly improved (e.g. in particular the keyword and abstract fields). The metadata quality issue is shared by the other two top satellite Data Providers (i.e. Copernicus Open Access Hub and USGS Landsat 8) and once solved it would improve discoverability of records and potentially attract a larger number of users. Note 1. http://www.geoportal.org/community/guest/yellow-pages.